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Abstract 

Quantum theory is a probabilistic theory with fixed causal structure. 
General relativity is a deterministic theory but where the causal structure 
is dynamic. It is reasonable to expect that quantum gravity will be a 
probabilistic theory with dynamic causal structure. The purpose of this 
paper is to present a framework for such a probability calculus. We define 
an operational notion of space-time, this being composed of elementary 
regions. Central to this formalism is an object we call the causaloid. This 
object captures information about causal structure implicit in the data 
by quantifying the way in which the number of measurements required 
to establish a state for a composite region is reduced when there is a 
causal connection between the component regions. This formalism puts 
all elementary regions on an equal footing. It does not require that we 
impose fixed causal structure. In particular, it is not necessary to as- 
sume the existence of a background time. The causaloid formalism does 
for probability theory something analogous to what Riemannian calculus 
does for geometry. Remarkably, given the causaloid, we can calculate all 
relevant probabilities and so the causaloid is sufficient to specify the pre- 
dictive aspect of a physical theory. We show how certain causaloids can 
be represented by suggestive diagrams and we show how to represent both 
classical probability theory and quantum theory by a causaloid. We do 
not give a causaloid formulation for general relativity though we speculate 
that this is possible. The causaloid formalism is likely to be very powerful 
since the basic equations remain unchanged when we go between different 
theories - the differences between these theories being contained in the 
specification of the causaloid alone. The work presented here suggests a 
research program aimed at finding a theory of quantum gravity. The idea 
is to use the causaloid formalism along with principles taken from the two 
theories to marry the dynamic causal structure of general relativity with 
the probabilistic structure of quantum theory. 



1 Introduction 



The two great pillars of twentieth century physics are general relativity (GR) 
and quantum theory (QT) and both have enjoyed considerable empirical suc- 
cess. It so happens that the domain where general relativity has been well 
verified corresponds to situations where quantum effects (such as superposi- 
tion) are negligible. And similarly, the domain where quantum theory has been 
verified corresponds to situations where general relativistic effects (such as mat- 
ter dependent curvature of space time) are negligible. Sufficiently sophisticated 
experiments would be able to probe domains where both quantum and general 
relativistic effects are significant. However, each theory is formulated in a way 
that requires that the particular effects of the other can be ignored and so, in 
such domains, we would not be able to make predictions. What is required is 
a new theory, a theory of quantum gravity (QG), which reduces to GR or to 
QT in the situation where quantum effects or where general relativistic effects, 
respectively, are small. The problem is that we want to proceed from two less 
fundamental theories (GR and QT) to a more fundamental theory (QG). How 
can we do this? One approach is to try to formulate one theory entirely in the 
terms of the other. For example, we might try to "quantize general relativ- 
ity". This is likely to work when one theory is clearly less fundamental than 
the other. However, GR and QT each bring fundamental notions to the table 
that cannot easily be accommodated in terms of the structures available in the 
other theory. Instead a more even handed approach seems favourable. This is 
problematic. It seems unlikely that we can combine two mathematical formula- 
tions of two different theories in an even handed way without stepping outside 
those mathematical formulations. Hence we adopt the following strategy. We 
will pick out essential conceptual properties of each theory and try to find a 
mathematical framework which can accommodate them. A historical exam- 
ple of this approach is provided by Einstein himself in his invention of special 
relativity which resulted from an attempt to combine Newtonian physics with 
electromagnetism. From Newtonian physics he took the Galilean principle of 
invariance for inertial frames and from electromagnetism he took the fact that 
the speed of light is independent of the source. These facts were set apart from 
their mathematical formulation in their original theories. Thus, Einstein stated 
Galileo's principle in words rather than giving it the usual mathematical expres- 
sion as the Galilean transformations. It was only having done this that he was 
able to avoid the mess associated with earlier attempts to reconcile Newtonian 
physics with electromagnetism in terms of the properties of an ether. Indeed, 
these earlier attempts were an attempt to formulate electromagnetism in within 
the Newtonian framework. 

With the implementation of this approach in mind, we note the following. 

1. General relativity is a deterministic theory with dynamic causal structure. 

2. Quantum theory is a probabilistic theory with fixed causal structure. 

Once the probabilistic cat is out of the bag it is unlikely that we will go back to 
a fundamentally deterministic theory. Likewise, once we have dynamic causal 

2 



structure it is unlikely that a more fundamental theory will have an underlying 
fixed causal structure. Hence, we require a mathematical framework for physical 
theories with the following properties: 

1. It is probabilistic. 

2. It admits dynamic causal structure. 

In this paper we will find such a framework. We will show how QT can be 
formulated in this framework. We also expect to be able to formulate GR in 
the framework though we do not give an explicit construction. But, of course, 
the real point of this exercise is that we should be able to formulate a theory 
of QG in this framework. And, further, this framework should make this job 
easier. We will suggest possible approaches to finding a theory of QG within 
this framework. 

2 Overview 

In GR we introduce coordinates x^. We can consider intervals (Sx M in these 
coordinates. We so not say up front which of these intervals (or which linear 
combinations of these intervals) is time- like. It is only after solving for the metric 
that we can do this. The causal structure is dynamic. In quantum theory, on 
the other hand, we must specify the causal structure in advance. One way to 
see this is to consider different ways in which we might put two operators, A 
and B, together. If the two regions corresponding to these operators are space- 
like separated then we use the tensor product A <g> B. If the two regions are 
immediately sequential (time-like) then we write BA. In order to know what 
type of product to take we need to know the causal structure in advance. We 
seek a new type of product which unifies these two products (along with any 
other products in QT) and puts them on an equal footing. 

The approach taken in this paper is operational. We define an operational 
notion of space-time consisting of elementary regions R x . An arbitrary region 
i?i may consist of many elementary regions. In region Ri we may perform 
some action which we denote by F Rl (for example we may set a Stern-Gerlach 
apparatus to measure spin along a certain direction) and observe something 
X Rl (the outcome of the spin measurement for example). Consider two disjoint 
regions R\ and i?2- Our basic objective is to find a formalism which allows us to 
calculate the probability for something in one region, conditioned on what 
happened in another region i?2 if this probability is well defined (we will explain 
what we mean by a "well defined probability" in Sec. I15|) . Namely we want to 
be able to calculate all probabilities of the form 

W ob{X Rl \F Rl ,X R2 ,F R2 ) (1) 

when well defined. We would like the formalism that does this to put every 
elementary region R x on an equal footing. 
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To this end we introduce vectors Y(x Ri ,f Ri )(Ri) f° r Ri (these are analogous 
to operators in QT) . Such vectors are defined for any region including the 
elementary regions R x . Given any composite region such as R\ U R2 we can find 
the corresponding r vector using the causaloid product 

r (x Rl ux R2 ,F Rl uF R2 )(Ri U R 2 ) = r ( x Rl ,F Rl )(Ri) ® A r (x R . 2 .F R2 )(R2) (2) 

This means that r vectors for any region can be built out of r vectors for the 
elementary regions, R Xl comprising this region. The causaloid product is given 
by the causaloid which we will describe briefly in a moment. The r vectors for the 
elementary regions themselves are also given by the causaloid. If this formalism 
is applied to QT then the causaloid product unifies the different products in QT 
mentioned above. 

We find that the probability 

prob{X Rl \X R2 ,F Rl ,F R2 ) 

is well defined if and only if 

v = r (XRi , FRi) ® A Y(x R2 ,f R2 ) 

is parallel to 

u = l>2 r (Y Rl ,F Rl )® A r { x R2 ,F R2 ) 

Y Rl 

(where the sum is over all possible observations, Y Rl , in R\ consistent with 
action F Rl ) and this probability is given by 

proh(X Rl \X R2 , F Rl , F R2 ) = M (3) 

where |a| denotes the length of the vector a. 

The causaloid is theory specific and is given by providing a means to calcu- 
late certain matrices (called lambda matrices). The lambda matrices quantify 
the way in which the number of measurements to determine the state is reduced 
due to correlations implied by the theory. We have lambda matrices for each 
elementary region (called local lambda matrices) and we have lambda matrices 
for every subset of elementary regions. Hence, at this general level all elemen- 
tary regions are put on an equal footing. In any specific theory we will expect 
that some lambda matrices will follow from others. In QT, for example, it turns 
out that we only need local lambda matrices and lambda matrices for pairs of 
adjacent regions. From these we can calculate all other lambda matrices. In 
a particular theory we will expect to break the symmetry between elementary 
regions by virtue of some particular choice of lambda matrices. For example, 
in QT the lambda matrix associated with a pair of adjacent elementary region 
is different to the lambda matrix associated with a pair of non-adjacent ele- 
mentary regions. However, the fact that we start with a formalism that does 
not impose any particular such structure from the very beginning puts us in a 
strong position to make progress in finding a theory of QG. 
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The causaloid framework does not have, as a fundamental notion, the idea 
of a state evolving in time. However, standard physical theories such as QT do. 
Thus, to help us put the QT in the causaloid framework, we will show how to 
recover a notion of a state evolving in time in the causaloid framework. Having 
done this we are able to put classical probability theory and quantum theory 
into the causaloid framework. Having pulled QT into the causaloid framework 
we can leave behind the problematic notion of a state at time t. 

Any attempt to find a theory of QG in this program is likely to start by 
putting GR into the framework. We discuss how this might be done before 
considering issues that arise in QG. 

The important new technical results in this paper are contained in Sec. 1121 
to Sec. |2H1 These sections are fairly self-contained and the impatient reader can 
jump straight to those sections on a first reading of this paper (though perhaps 
skimming the earlier sections). 

3 Data 

We are looking for a framework for physical theories. But what is a physical 
theory and what does it do for us? There are many possible answers to these 
questions. But we take the following to be true: 

Assertion: A physical theory, whatever else it does, must correlate 
recorded data. 

A physical theory may do much more than this. For example it may provide 
a picture of reality. It may satisfy our need for explanation by being based 
on some simple principles. But for a physical theory to have any empirical 
content, it must at least correlate recorded data. This sounds like a rather weak 
assertion. But, as we will see, it provides us with a strong starting point for the 
construction of the framework we seek. 

The assertion above leaves unspecified what the word "correlate" means. 
This could be deterministic correlation, probabilistic correlation, or conceivably 
something else. Since we will be interested in probabilistic theories, we will take 
this to mean probabilistic correlation. 

What is data? We can compile the following list of properties that data has. 

1. Data is a record of (i) actions and (ii) observations. For example it might 
record statements like (i) I lifted the rock and let go (an action), and (ii) 
it fell and hit my toe (an observation). 

2. Data is recorded by physical means. For example it may be written on 
bits of paper, stored in a computer's memory, or stored in the brain of the 
experimentalist . 

3. Data is robust (it is unlikely to randomly change). 

4. Data can be copied so that new physical records exist. 
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5. Data can be translated (e.g. English to French or binary to base 10). 

6. Data can be moved around (e.g. in wires or on bits of paper). 

7. Data can be processed (e.g. to check that it is correlated according to 
some physical theory). 

The physicality of data may concern us a little - especially in those situations 
where we expect the physical systems which store, transport, and process the 
data to interfere with the physical experiment we are performing. To deal with 
this concern we make the following assumption 

The indifference to data principle: ft is always possible to find 
physical devices capable of storing, transporting, and processing 
data such that (to within some arbitrarily small error) the prob- 
abilities obtained in an experiment do not depend on the detailed 
configuration of these devices where this detailed configuration cor- 
responds to the particular data and programming (for the program 
which will process the data) whilst it is being stored, transported, 
and processed. Such physical devices will be called low key. 

Without such a principle the probabilities might depend on whether the exper- 
iment is conducted by an Englishman or a Frenchman (since the same data in 
English or French will have a different detailed physical configuration). This 
principle does not imply that the presence of the physical device which stores 
and processes the data has no effect on the experiment. But rather that any 
such effect does not depend on the detail of the data being stored. For example, 
a computer being used to record and process data from a nearby gravitationally 
sensitive experiment has mass and therefore will effect the experiment in ques- 
tion. However, this effect will not depend on the detailed configuration of the 
computer which corresponds to the data (or at least that effect will be arbitrar- 
ily small). Consequently the principle does not forbid the physical data devices 
from being part of the experiment. For example, we could throw a computer 
from the leaning tower of Pisa to gain information about how the computer 
falls. It might collect data through a camera about the time it passes successive 
levels of the building. In this case the data device is actually part of the experi- 
ment and the principle still applies. This means that we do not need to put the 
observer (for observer read "physical data devices") outside the system under 
investigation. The observer can be a part of the system they are investigating so 
long as they can store and process data in a low key manner. In fact, one might 
even argue that we must always regard the observer as part of the system we 
are observing. How could the data end up being collected otherwise? Of course, 
there are certain situations where we have reasons to regard the observer as 
being outside the system under consideration - namely those situations where 
the probabilities measured do not depend on the bulk properties of the observer. 
However, the important point is that we can do physics when this is not the case 
so long as we can have low key data devices. It is easy to imagine data process- 
ing devices which are not low key. For example, we could use a computer which 
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stores information in the configuration of a number of large rocks rather than a 
standard electronic computer to store and process data about about a nearby 
gravitationally sensitive experiment. Then the probabilities would depend on 
the detail of the data. 

The fact that we start with considerations about data where data is a collec- 
tions of actions and observations puts us in an operational or instrumental mode 
of thinking. Opcrationalism played a big role in the discovery of both relativity 
theory and QT. There are different ways of thinking about operationalism. We 
can either take it to be fundamental and assert that physical theories are about 
the behaviour of instruments and nothing more. Or we can take it to be a 
methodology aimed at finding a theory in which the fundamental entities are 
beyond the operational realm. In the latter case operationalism helps us put 
in place a scaffolding from which we can attempt to construct the fundamental 
theory. Once the scaffolding has served its purpose it can be removed leaving 
the fundamental theory partially or fully constructed. The physicist operates 
best as a philosophical opportunist (and indeed as a mathematical opportunist). 
For this reason we will not commit to either point of view for the time being 
noting only that the methodology of operationalism serves our purposes. In- 
deed, operationalism is an important weapon in our armory when we are faced 
with trying to reconcile apparently irreconcilable theories. A likely reason for 
any such apparent irreconcilability is that we are making some unwarranted 
assumptions beyond the operational realm. It was through careful operational 
reasoning that Einstein was able to see that absolute simultaneity is unneces- 
sary (since it has no operational counterpart). The operational methodology is 
a way of not making wrong statements. If we are lucky we can use it to make 
progress. 

4 Remarks on quantum theory 

When all is said and done, quantum theory provides a way of calculating prob- 
abilities. It is a probability calculus. Hence, its natural predecessor is not 
Newtonian mechanics or any other branch of classical physics, but rather what 
might be called classical probability theory (CProbT). Thus, in the same way 
that CProbT can be applied to various physical situations from classical physics 
(such as systems of interacting spins, particles in a gas, electromagnetic fields...) 
to calculate probabilities, quantum theory can be applied to various different 
physical situations (interacting quantum spins, a quantum particle in a poten- 
tial well, quantum fields, ....) to calculate probabilities. Quantum theory is, like 
classical probability theory, a meta theory with many realizations for different 
physical situations. 

One particular realization of QT is what might be called quantum mechanics 
(QM). This is the non-relativistic theory for multi-particle systems in which we 
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introduce a wavcfunction ip{x\, X2, ■ • ■ ,t) and the Schrodinger equation 

i 

to evolve the wavefunction. QM is an example of QT. 

Quantum field theory (QFT) is another example of QT. The basic frame- 
work of quantum theory (which we will present in detail below) consisting of 
an evolving state p that acts on a Hilbert space Tt is capable of expressing both 
non-relativistic quantum mechanics and relativistic QFT (see pg. 49 of pQ). 
This is clearest in the formulation of QFT in which we write down a superwave- 
functional ^((f>(x)) which we can regard as a linear superposition of basis states 
where these basis states correspond to definite configurations of the field <j>{x) . 

This addresses a common misconception. We should regard QFT as a special 
case of QT rather than something like the converse. Thus we should not think 
of QT as a limiting case of QFT - though we might attempt to derive QM as 
the limit of QFT. It is not the case that QT, thus understood, is necessarily 
non-relativistic. The only point that should be added to these remarks is that 
QFT requires an infinite dimensional Hilbert space whereas we can do a lot in 
non-relativistic scenarios with finite dimensional Hilbert spaces. However, this 
is a technical rather than conceptual point and, in any case, there are good 
reasons to believe that a theory of quantum gravity will have something like 
a finite dimensional Hilbert space. In our discussion of QT we will stick with 
finite dimensional Hilbert spaces both for technical simplicity and because we 
are dealing with finite data sets. Issues related to infinities and continuities will 
be discussed in Sec. 03 

In going from QT to QFT we have to add much additional structure to QT. 
However, the deep conceptual novelties of QT are evident without going to QFT. 
For this reason it seems reasonable to look at ways to combine QT (rather than 
QFT) with GR. This way we can hope to import these conceptual novelties into 
a theory of QG without getting distracted by the additional structure of QFT. 
Ultimately we would require that a theory of QG incorporate QFT (at some 
appropriate level of approximation). However, we take the attitude that this is 
not likely to be important in the early days of the construction of QG and it 
may even be possible to fully construct QG before taking on this consideration. 

5 Basic framework for operational theories 

We want to give a simple operational formulation of CProbT and QT and for 
this purpose we present a framework which works for probabilistic theories that 
admit a universal background time. 

The basic scenario we consider is that shown in Fig. ^ This consists of a 
sequence of operations on the system. We can represent these operations by 
boxes. Each box has a knob on it which can be used to vary the operation 
implemented. At each operation we have the possibility of extracting some 
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Outcomes 




Figure 1: Sequence of operations acting on system. Each operation has a knob to 
vary the operation enacted and an outcome which is recorded as data. 



outcome I (this is data). Each operation can be regarded alternatively as (a) a 
preparation (since it outputs a system), (b) a transformation (since it transforms 
the state of a system), and (c) a measurement (since it inputs a system and 
outputs an outcome /). The same can be said of any sequence of such operations. 
We can, strictly, only regard an operation as a preparation if it outputs the 
system in a definite state. We define the state in the following way. 

The state associated with a preparation is that thing represented by 
any mathematical object that can be used to calculate the probabil- 
ity for every outcome of every measurement that may be performed 
on the system. 

Given this definition we can define the state to be represented by a list of all 
probabilities, 

( \ 



Po 



(5) 



V : / 

where a labels every outcome of every possible measurement. We note that we 
can write 

p a = R« • P (6) 

where R a has a 1 in position a and O's everywhere else. The object P contains a 
lot of information. In general we would expect a physical theory to afford some 
simplification so that some entries in P can be calculated from other entries. In 
fact we can insist that this be done by a linear formula so we have a state given 
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by 

/ Pi \ 



P2 



(7) 



\ PK J 

such that 

Pa = r a ■ p (8) 

The pkS are the probabilities associated with a set of fiducial measurement 
outcomes. We take K to be the minimum number of entries in p that makes it 
possible to write the state by a linear formula like this. That this will always 
be possible is clear since we have © as a last resort. 

A special measurement is the identity measurement r 1 corresponding to the 
measurement whose result is positive if any result is seen. In the case that the 
state is normalized we have r 1 ■ p = 1. However, for technical reasons, we will 
not normalize the state after each step. 

It follows from the fact that these probabilities are given by a linear formula 
that the transformation of the state is given by a linear formula. Thus, if we 
obtain result I the new state is 

P -> Zip (9) 

where Zi is a K x K real matrix. This is clear since each component of the new 
state must be given by a linear combination of the components in the initial 
state. The probability of outcome I is 

probi = ELl^P (io) 

rj • p 

The state can be normalized by dividing by r 1 ■ p. However, this introduces 
unnecessary non-linearities in the evolution of the state. It is more convenient 
to allow the state to be unnormalized and use the above formula for calculating 
probabilities. 

We may have more than one system. We need a way of setting up the 
framework for such composite systems. Both CProbT and QT turn out to be 
simple in this respect. The state of a composite system is given by specifying 
joint probabilities Pkrk 2 with k± = 1 to K\ and k-2 = 1 to 



6 Brief summary of classical probability theory 



Consider a classical system which can be in one of N distinguishable configura- 
tions (for example a bit has N = 2). We can write the state of this system by 
specifying a probability, p n , for each configuration, n. 



P2 



(11) 



\PN J 
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Note that K = N . We define the identity measurement vector, 17, for CProbT 

by 

( 1 \ 
l 



17 



(12) 



Now we can state postulates of CProbT in compact form. 

1 . The state of a system is given by p G Sn where Sn is defined by (i) p n > 
and (ii) 17 • p < 1. 

2. The state p for a composite system 12 made from systems 1 and 2 has 
components p ni n 2 an d belongs to Sn 1 n 2 - 

3. Any operation which transforms of the state of a system and has classical 
outcomes labeled by I is associated with a set of N x N matrices Zi which 
(i) map Sn into Sn, and (ii) have the properties that 17 • Zip < 17 • p and 
and 17 • (J2i %l)p = r l • P for all states. The probability of outcome I is 



prob, 



rj ■ Zip 
rj • p 

and the state after outcome I is observed is 

P -> Zip 



(13) 



(14) 



A few notes of clarification are useful here. We deliberately do not impose the 
normalization condition 17 • p = 1 (though we impose the condition 17 • p < 1 
to keep these vectors bounded) . First it is not necessary to normalize since the 
denominator on the RHS of (|13f) ensures that the sum of probabilities over all 
outcomes adds up to 1. Second, it is useful to allow the freedom not to normalize 
since then we can regard Zip as a new state even though this new state is not 
normalized. We can, if we wish, normalize a state by dividing it by 17 • p. 



7 Brief summary of quantum theory 

The postulates of quantum theory (stripped of additional structure pertaining 
to particular applications) can be written in the following compact form. 

1 . The state of a system is given by a positive operator p acting on a complex 
Hilbert space 7i with trace(p) < 1. 

2. A composite system 12 made from systems 1 and 2 has Hilbert space 
Hi ®H 2 . 
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3. Any operation which transforms of the state of a system and has classical 
outcomes labelled by I is associated with a set of trace non-increasing com- 
pletely positive linear maps (also known as superoperators), $;(•), where 
J2i §l is trace preserving. The probability of outcome I is 

trace($ ; (/5)) 

Pr ° b ' ^ tracc(p) (15) 
and the state after outcome I is observed is 

p -$,(/>) (16) 

A completely positive linear map $ is one which, when extended to a composite 
system 12 as $® I (where I is the identity map on system 2), leaves the state for 
the total system pi2 positive regardless of the initial state of the total system and 
the dimension of system 2. This property is required for the internal consistency 
of the theory. There are two familiar special cases of superoperators. First there 
are unitary maps, 

$(p) = UpW (17) 

where U is a unitary operator (satisfying UU^ — I) . We can understand this as 
an example of postulate 2 above where I only takes one value (so we always see 
the same result). Second there are projection operators, 

Hp) = PpP (18) 

where P is a projection operator (satisfying PP = P). The first example is 
trace preserving and the second example is trace decreasing. There are many 
other types of superoperator which, in one way or another, extrapolate between 
these two extremes. 

States can normalized at any time by applying the following formula 

^tdbo* (19) 

In calculating the probabilities for outcomes of measurements we are not neces- 
sarily interested in the evolution of the state of the system during measurement. 
To this end we can associate a set, {A[}, of positive operators with 

prob; = trace(A;/5) = trace($;(p)) (20) 

if the state is normalized. There is a many to one linear map between and 
Ai (which can be written down explicitly). Usually positive operators A (or 
something like them) are introduced in the postulates. However, this is not 
necessary. 
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8 Quantum theory formulated in similar way to 
classical probability theory 

We can reformulate QT in a way which resembles CProbT. We define 



A = 



V A m J 



(21) 



where Af, for k = 1 to N 2 are a fixed fiducial set of linearly independent positive 
operators which span the space of positive operators acting on an TV dimensional 
Hilbert space Hn- We define the identity measurement vector, rj, for QT by 

r/ • A = J (22) 

where I is the N 2 x N 2 identity matrix. Since the operators A^ form a complete 
linearly independent set, 17 is unique. Now we can restate the postulates of QT. 

1. The state of a system is given by p € where S N is the set of p for 
which (i) we can write p = trace(Ap) where p is a positive operator acting 
on Hn and (ii) rj • p < 1. 

2. The states for a composite system 12 made from systems 1 and 2 belong 
to S N N derived from Ai <g> A 2 whose elements act on Hi <g> Hi. 

3. Any operation which transforms of the state of a system and has classical 
outcomes labelled by I is associated with a set N 2 x N 2 matrices Zi which 
(i) are such that Z\(& 1 maps S NM into for any ancillary system 
of any M, and (ii) have the property that rj ■ Zip < rj ■ p and also 
17 • (^2 t Zi)p = r/ • p for all states. The probability of outcome I is 

P rob ; = 11^* (23) 
rj • p 

and the state after outcome I is observed is 

P - Z lP (24) 

Note that for QT we have K — N 2 . It is easy to show that 

Z = trace(A$(A T ))[trace(AA T )]- 1 (25) 
where the superscript T denotes transpose. 
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9 Reasonable postulates for quantum theory 



The postulates of QT, unlike those of CProbT, are rather abstract. It is not 
clear where they come from. However, QT can be obtained by a set of what 
might be regarded as reasonable postulates Before stating these we need a 
few definitions. 

The maximum number of reliably distinguishable states, 

N, is defined to be equal to the maximum number of members of 
any set of states which have the property that there exists some 
measurement device which can be used to distinguish the states in 
a single shot measurement (so that the sets of outcomes possible for 
each state in this set are disjoint). 

This quantity captures the information carrying capacity of the system (in QT 
it is simply the dimension of the Hilbert space). More exactly, we might say 
that the information carrying capacity of the system is log 2 N bits. We will say 
that 

A system is constrained to have information carrying ca- 
pacity log 2 M if the states are such that, for a measurement set to 
distinguish N distinguishable states, we only ever obtain outcomes 
associated with some given subset of M of these states. 

We want to define a useful notion 

Equivalence classes. Two operations belong to the same equiv- 
alence class if replacing one by the other gives rise to the same 
probabilities 

An example might be measuring the polarization of a photon with a polarizing 
beamsplitter or a calcite crystal each orientated at an angle 9. If one device is 
replaced by the other (with an appropriate identification of outcomes between 
the two devices) then the probabilities remain the same. 

The spin degree of freedom of an electron and the polarization degree of 
freedom of a photon are, in each case, described by a Hilbert space of dimension 
2. There is a sense in which these two systems have the same properties. We 
define this idea as follows 

Two systems have the same properties if there is a mapping 
between equivalence classes of operations such that under this map- 
ping we get the same probabilities for outcomes for each type of 
system. 

This mapping might take us from an experiment involving an electron's spin de- 
gree of freedom to another experiment involving a photon's polarization degree 
of freedom. For example, the electron could be prepared with spin up along the 
z direction, sent through a magnetic field which (acting as a transformation) 
rotates the spin through 20° in the zx plane. Then the electron impinges on a 
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Stcrn-Gerlach device orientated in the x direction. Under a mapping between 
equivalence classes, this might correspond to an experiment in which a photon is 
prepared with vertical polarization. The photon passes through a crystal which 
rotates its polarization through 10° and then onto a polarizing beamsplitter 
orientated at angle 45°. The probabilities seen would be the same in each case 
(note that the angles must be halved for the photon since orthogonal states are 
vertical and horizontal rather than up and down). 
We define 

Pure states are states which cannot be simulated by mixtures of 
other distinct states 

Also we define 

A reversible transformation is a transformation, T, for which 
there exists another transformation T _1 which is such that if T is 
applied, then T _1 , the over all transformation leaves the the in- 
coming state unchanged (it is the identity transformation) for all 
incoming states 

and 

A continuous transformation is one which can be enacted by se- 
quential application of infinitely many infinitesimal transformations 
where an infinitesimal transformation is one which has the property 
that it only has an infinitesimal effect on the probability associated 
with any given outcome for any measurement that may be performed 
on the state 

Given these definitions, we are in a position to state the postulates 

Postulate Probabilities. Relative frequencies (measured by taking the pro- 
portion of times a particular outcome is observed) tend to the same value 
(which we call the probability) for any case where a given measurement is 
performed on a ensemble of n systems prepared by some given preparation 
in the limit as n becomes infinite. 

Postulate 1 Information. There exist systems for which N — 1,2, • • • , and, 
furthermore, systems having, or constrained to have the same information 
carrying capacity have the same properties. 

Postulate 2 Composite systems. A composite system 12 consisting of subsys- 
tems 1 and 2 satisfies N12 = N1N2 and K\i = K\Ki- 

Postulate 3 Continuity. There exists a continuous reversible transformation 
on a system between any two pure states of that system for systems of 
any dimension N. 

Postulate 4 Simplicity. For each given N, K takes the minimum value con- 
sistent with the other axioms. 
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The crucial postulate here is Postulate 3. If the single word "continuous" is 
dropped from these axioms, then we get classical probability theory rather than 
quantum theory. 

These axioms give rise to the full structure of quantum theory with operators 
on finite dimensional complex Hilbert space as described above. The construc- 
tion theorem is simple but rather lengthy and the reader is referred to 2 for 
details. However, the main ideas are as follows. If follows from Postulate 1 
that K = K(N) and K(N + 1) > K(N) (this second point is not obvious). It 
then follows from Postulate 2, after a little number theory, that K — N r where 
r = 1,2,3,.... By the simplicity postulate we should take the smallest value 
of r that works. First we try r = 1 but this fails because of the continuity 
postulate. Then we try r — 2 and this works. Thus, we have K = N 2 . (As an 
aside, if we dropped the word "continuity" we get r = 1 and hence K — N . This 
leads very quickly to classical probability theory.) Next we take the simplest 
nontrivial case N — 2, and K = 4. If we just consider normalized states then 
rather than 4 probabilities we have three. We apply the group of continuous 
reversible transformations (implied by the continuity postulate) to show that 
the set of states must live inside a ball (with pure states on the surface). This 
is the Bloch ball of quantum theory for a two dimensional Hilbert space. Thus, 
we get the correct space of states for two dimensional Hilbert space. We now 
apply the information postulate to the general N case to impose that the state 
restricted to any two dimensional Hilbert space behaves as a state for Hilbert 
space of dimension 2. By this method we can construct the space of states for 
general N. Various considerations give us the correct space of measurements 
and transformations and the tensor product rule and, thereby, we reconstruct 
quantum theory for finite N. 

10 Remarks on general relativity 

General relativity was a result of yet another attempt to make two theories 
consistent, namely special relativity and Newton's theory of gravitation. Ein- 
stein gives various reasons that Galileo's principle of invariance is not sufficiently 
general since it applies only to inertial frames. He replaces it with the following 

The general laws of nature are to be expressed by equations which 
hold good for all systems of co-ordinates, that is, are co- variant with 
respect to any substitutions whatever (generally covariant) 3 . 

He then employs the equivalence principle to argue that the metric repre- 
sents the gravitational field. He sets up the mathematics of tensors as objects 
which have the property that a physical law expressible by setting all the com- 
ponents of a tensor equal to zero is generally covariant. Out of these ideas he is 
able, by an ingenious chain of reasoning, to obtain field equations for GR which 
can be expressed as 

Gfu, = 8nT^ (26) 
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G M „ is Einstein tensor and is a measure of the local curvature concocted out of 
derivatives of the metric. 7j,„ is the stress-energy tensor and is determined by 
the local properties of matter. 

We can distinguish two roles for the metric. 

1. It determines the set of local frames in which gravitational fields vanish 
and inertial physics applies locally. This property is given by the local 
value of the metric without taking into account its variation with x^ . 

2. It tells us how to compare local fields at two different points by providing 
a way to parallel transport such fields. This it does via the connection 
r™j, and the machinery of covariant differentiation. This property does 
depend on the variation of the metric with x^ . 

The metric also determines causal structure. It tells us whether two events are 
space- like or time- like and thereby determines whether we can send a signal from 
one to the other. However, the metric is a dynamical feature of the theory. It is 
determined by solving the Einstein field equations. Hence the causal structure 
is dynamical. We can try to express this in more operational terms. Given 
local events with local labels x^ (which may be abstract or read off some real 
physical reference frame) there is no way in general to say, in advance (that 
is without solving the equations), whether it is possible to send a signal from 
one to another. In non-gravitational physics we have fixed causal structure. 
For example, in SR the metric is fixed and so we know the causal structure in 
advance. 

GR is a deterministic theory. In classical physics we can always introduce 
probabilities simply by applying CProbT. However, CProbT assumes a fixed 
causal structure just as QT does. In particular it assumes a fixed background 
time. Hence, we would not expect to be able to apply CProbT to GR in a 
straight forward manner. Indeed, we could consider the program of unifying 
CProbT and GR to find what we might call probabilistic general relativity 
(ProbGR) . This program might be expected share many of the same difficulties 
we face in the program to find a theory of QG. We have taken as our goal to find 
a framework for probabilistic theories which admit dynamic causal structure. 
This framework should include ProbGR as well as QG. Hence, we will need to 
introduce further principles to get QG rather than ProbGR. For such princi- 
ples we can look to the differences between CProbT and QT. These differences 
are especially clear in the formulation of QT in Sec. OD which looks similar to 
CProbT and in the postulates introduced in Sec. 03 (see for a discussion of 
the differences between CProbT and QT). 

11 Remarks on the problem of finding a theory 
of quantum gravity 

The most obvious issue that arises when attempting to combine QT with GR 
is that QT has a state on a space-like surface that evolves with respect to 
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an external time whereas in GR time is part of a four dimensional manifold 
whose behavior is dynamically determined by Einstein's field equations. We 
can formulate GR in terms of a state evolving in time - namely a canonical 
formulation [S] EJ. Such formulations are rather messy (having to deal with 
the fact that time is not a non-dynamical external variable) and break the 
elegance of Einstein's manifestly covariant formulation. Given that Einstein's 
chain of reasoning depended crucially on treating all four space-time coordinates 
on an equal footing it is likely to be at least difficult to construct QG if we 
make the move of going from a four dimensional manifold M. to an artificial 
splitting into a three dimensional spatial manifold £ and a one dimensional time 
manifold R. But there is a further reason coming from quantum theory that 
suggests it may be impossible. If the causal structure is dynamically determined 
then what constitutes a space-like surface must also be dynamically determined. 
However, in quantum theory we expect any dynamics to be subject to quantum 
uncertainty. Hence, we would expect the property of whether a surface is space- 
like or not to be subject to uncertainty. It is not just that we must treat space 
and time on an equal footing but also that there may not even be a matter-of- 
fact as to what is space and what is time even after we have solved the equations 
(see [Z1IH]). To this end we will give a framework (which admits a formulation of 
quantum theory) which does not take as fundamental the notion of an evolving 
state. The framework will, though, allow us to construct states evolving through 
a sequence of surfaces. However, these surfaces need not be space-like (indeed, 
there may not even be a useful notion of space-like). 

Another way of thinking about these issues is by considering how we might 
probe causal structure. The most obvious way to do this is to use light signals 
since they probe the light cone structure which underpins causal structure in 
GR. In GR we are typically interested in cases where the presence of light 
represents only a small perturbation and so we freely employ counterfactual 
reasoning in which we imagine sending or not sending light signals without 
having to consider the effect this has on the solution to Einstein's field equations. 
On the other hand, in QT, the presence or not of even a single photon can 
have a dramatic effect on what is seen. The most clear example of this is 
provided in 9 in which an interference effect involving one photon depends 
on whether the path of another photon is blocked or not. In QT we have a 
fixed background causal structure (which might be implemented for example 
by bolting apparatuses down to a rigid structure) and so there is no need to 
employ such reasoning about the counterfactual transmission of photons for 
the purposes of understanding causal structure. However, in QG, we will not 
assume that there is a fixed background causal structure. We cannot assume 
that two regions of space-time have a certain causal relationship in the absence 
of any photon being transmitted between them just because we know that their 
causal relationship would be fixed if a photon were to be so transmitted. This 
line of thinking lends separate support to the possibility mentioned above that 
there may not even be a matter-of-fact as to what is space and what is time 
even after the equations have been solved. 

A more mathematical handle on this issue can be gained by considering the 



18 



various ways in quantum theory we can put together pairs of operators A and B. 
We can form the product AB. We can take the tensor product A®B. A third 
slightly more subtle example is A1B where ? stands for an unknown operator. 
The first and third of these two examples correspond to a time-like situation 
whereas the second case corresponds to a space-like situation (or at least an 
equal time situation). Each of these cases is treated on a different footing in 
QT. In GR we initially treat space and time on an equal footing. Thus, we 
introduce four space-time coordinates x^ (with /i = 0, 1, 2, 3) giving rise to the 
intervals 8x^ . We do this without saying which of these intervals (or which linear 
combinations of them) are time-like. We then solve Einstein's field equations 
and obtain a metric g^. From the metric (which has a Lorentzian signature) 
we can identify which linear combinations pertain to time-like intervals. But, 
at least in principle, we do this after we have solved the field equations. Thus, 
by analogy, if we are to treat space and time on an equal footing in QG as we 
do in GR then we would also want to put those objects in the theory of QG 
which correspond to the three types of product in QT mentioned above on an 
equal footing. This should already be an issue in special relativistic quantum 
theory though since the causal structure is fixed in advance it is not essential 
we attend to it. But in QG it is likely to be quite essential. In fact, in QG 
the issue is likely to be even more serious than it is in GR since it may be, as 
mentioned above, that even after the equations have been solved, we are unable 
to identify what intervals are space-like and what are time-like. In order to 
address this issue we will define a new type of product which unifies all these 
types of products in quantum theory (and their counterparts in more general 
probabilistic frameworks) putting them on an equal footing. 

We should ultimately be interested in experiments to test a theory of QG. 
Before we get to real experiments it is interesting to consider gedankenexper- 
iments which illuminate the conceptual structure of a theory. As we noted in 
the introduction, a theory of QG would be necessary in a situation in which 
we could not neglect the particular effects of both GR and QT. The type of 
gedankenexperiment in which this is going to be the case will be one where we 
simultaneously have dynamic causal structure and quantum superposition. Such 
a situation occurs when we look for quantum interference of a massive object 
which goes into a superposition of being in two places at once. Gedankenex- 
periments of this type have been discussed by Penrose and there has been 
some effort to design a realizable version of this type of experiment [TT] . 

12 Setting the scene 

We repeat our assertion from Sec. A physical theory, whatever else it does, 
must correlate recorded data. Data is a record of actions and observations taken 
during an experiment. We will assume that this data is recorded onto cards. 
Each card will record a small amount of proximate data. We will illustrate 
what we mean by this soon with examples. Thus the cards represent something 
analogous to Einsteinian events. One piece of data recorded on any given card 



19 







V V V V 



Figure 2: A source of electrons is followed by four Stern-Gerlach apparatuses shown 
schematically here. Data is recorded onto a card at each apparatus. 



will be something we will regard as representing or being analogous to space- 
time location. Of course, it is not necessary to record the data onto cards. It 
could be recorded in a computer's memory, or by any other means. But this 
story with the cards will help us in setting up the mathematical framework we 
seek. 

We will consider examples where the data recorded on each card is of the form 
(x,a,s). The first piece of data, x, is an observation and represents location. 
For example, it could be the space-time position read off some real physical 
reference frame such as a GPS system or a reference mollusc 13 . Or it 
might be some other data that we are simply going to regard as representing 
location. The second piece of data, a, represents some actions. For example it 
might correspond to the configuration of some knobs we have freedom in setting. 
The third piece of data, s, represents some further local observations. Here are 
two examples of this type 

1. Imagine we have a sequence of four Stern-Gerlach apparatuses (shown 
schematically in Fig. 0) labelled x = 1,2,3,4 preceded by a source of 
electrons labelled x = Q. We can set each Stern-Gerlach apparatus to 
measure spin along the direction a. Then we record the outcome s = 
±1/2. Thus we get a card from each Stern-Gerlach apparatus with (x, a, s) 
written on it. We would also want to extract a card (or maybe a set of 
cards) from the apparatus which prepares the electrons. For example we 
could write (0, a, s) where a = "source appropriately constructed" , and 
s= "observations consistent with source's proper functioning seen". From 
each run of the experiment we would collect a stack of five cards. We 
could vary the settings a on the Stern-Gerlach apparatuses. To extract 
probabilities we would want to run the experiment many times. 

2. Imagine we have a number of probes labelled n — 1,2,... TV drifting 
in space (see Fig. |3J|. Each probe has a clock which ticks out times 
t n = 1,2, ... ,T. Each probe has knobs on it which correspond to the 
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Figure 3: We have a number of probes (two shown here) drifting in space. Data is 
collected onto a card at each probe at each tick of the probe's clock. 

settings of some measurement apparatuses on the probe. We let the 
different configurations of these knobs be labelled by a. Further, each 
probe has some meters which record the outcomes of those experiments. 
We let s label these outcomes. On each card we can record the data 
(x = (n,t n , i^), a, s) where is the retarded time of probe m as seen 
at probe n. We may want to put more information into x, such as the 
observed relative orientations. And we may want to reduce the amount of 
information in x, for example we could only record the retarded times of 
three specific probes (providing something like a GPS system). We take 
one card from each probe for each tick of the probe clock. At the end 
of each run of the experiment we will have a stack of NT cards. The 
clocks can be reset and the experiment repeated many times to obtain 
probabilities. 

In these cases the number of cards in the stack is the same from one run to the 
next but this need not be the case. 

We introduce the important notion of a procedure, F, which tells the exper- 
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imentalist what actions to implement - that is how to set the knobs. In our 
examples above (where we record data (x, a, s) onto the cards) we allow the 
choice of a to depend on x by some function a = F(x). After a procedure has 
been implemented we will end up with a stack, X, of cards. This stack contains 
all the data for the experiment. 

We may have more complicated ways of recording data onto cards. A useful 
example is where we record (x, q, a, s) onto the cards where q represents some 
observations we do not want to regard as part of x which we make immediately 
before implementing the action a. In this case we can have a — F(x, q). 

We may wish to condition F at x on some data collected "previously" at x'. 
For example we might want to have a — F(x, s') where s 1 is data recorded at 
x'. However, since we do not want to assume we have fixed causal structure, it 
is a matter of the physical dynamics as to whether data recorded at x' will be 
available at x. Therefore it makes no sense to allow this functional dependence. 
Rather, any such dependence must be implemented physically. For example, 
q could be equal to the retarded value of s' seen at x. Then we can write 
a = F{x,q). 

It is possible that some cards in the stack will be repeated. This could happen 
for example if the clocks in the second example above were faulty and sometimes 
ticked out the same value twice. To get round this we replace repeated cards 
by a new card having the same data plus the multiplicity appended to s. 

13 Thinking inside the box 

After each run of the experiment we will have a stack of cards which we denote 
by X. We can bundle these together and attach a tag to this bundle specifying 
F. This can be repeated many times for each possible F. We imagine that all 
these tagged bundles are sent to a man in a sealed room who analyzes them (see 
Fig. 01). The man cannot look outside the sealed room for extra clues. Hence, 
all concepts must be defined in terms of the cards themselves. The point of this 
story is that it enforces a particular kind of honesty. The man is not able to 
introduce what Einstein called "factitious causes" [3] such as an unobservable 
global inertial reference frame since he is forced to work with the cards. 

The order of the cards in any particular stack does not, in itself, constitute 
recorded data and is of no significance. Likewise the order the bundled stacks 
arrive in the sealed room is also of no significance. Thus, in his analysis of the 
cards, the man in the sealed room should not use these orderings. We could 
imagine that each stack is shuffled before being bundled and the bundles are 
then shuffled before being sent to the sealed room. Of course, it is significant 
which bundle a particular card belongs to and so we should not allow cards from 
different bundles to get mixed up. 

To aid our analysis we begin with a few simple definitions. 

The stack, X, is the set of cards collected in one run of an experi- 
ment. 



22 



Stack of cards from 




Figure 4: Data is collected on cards from an experiment, bundled and tagged with a 
description of the procedure followed. This tagged bundle is sent to a man in a sealed 
room. This is repeated many times for each procedure. 



The full pack, V is the set of all logically possible cards when all 
possible procedures are taken into account. It is possible that some 
cards never actually occur in any stack because of the nature of the 
physical theory but we include them in V anyway. 

The procedure will be specified by that set of cards F which is the 
subset of all cards in V which are consistent with the procedure F. 
For example, if the data on each card is of the form (x, a, s) then the 
set F is all cards of the form [x, F(x), s). We deliberately use the 
same symbol, F, to denote the abstract procedure F, the function 
F(x), and the set F since it will be clear from context which meaning 
is intended. 

We have 

X C F C V (27) 

We note that these definitions are in terms of the cards as required. 

As described we imagine repeating the experiment many times. In Sec. 1351 
we will suggest an approach that does not involve repeating the experiment as 
suggested here. 
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14 Regions 



We continue to provide definitions in terms of the cards. We define the notion 
of a region. The region Rq is specified by the set of cards from V having x G O. 
We define R x to be an elementary region consisting only of the cards having x 
on them. Regions can be regarded as the seat of actions. In a region we have 
an independent choice of which action to implement. This captures the notion 
of space-time regions as places where we have local choices. 

When we have a particular run of the experiment we end up with a stack X 
of data. We can allocate this data to the appropriate regions. Then we have a 
picture of what happened as laid out in a kind of space-time. For a region R\ 
(which we take to be shorthand for Rq 1 ) we define the stack in Rx to be 

X Rl =XC\ R x (28) 

These are the cards from the stack, X,that belong to region R±. 
We define the procedure in R\ to be 

F Rl = F n Rx (29) 

these are the cards from the set F which belong to Rx. Clearly 

X Rl C F Rl C Rx (30) 

Given (X Rl , F Rl ) we know "what was done" (F Rl ) and "what was seen" (X Rl ) 
in region Rx. 

15 Statement of objective 

We seek to find a formalism to calculate conditional probabilities of the form 

W ob(X Rl \X R2 ,F Rl ,F R2 ) (31) 

when these probabilities are well defined without imposing any particular causal 
structure in advance. Of course, any particular theory we might cast in terms of 
this formalism to is likely to have some sort of causal structure built in and this 
will be evident in the particular form the mathematical objects in the formalism 
end up taking. 

If the above probability is given a frequency interpretation it is equal to 

N(X Rl , X R2 , F Rl , F R2 ) 
N(X R2 ,F Rl ,F R2 ) 

in the limit as the denominator becomes large. Here N(-) is the number of 
stacks satisfying the given condition. 

This probability may not be well defined if there is insufficient conditioning. 
To see this consider the example with four Stern-Gerlach apparatuses illustrated 
in Fig. |21 Let Rx consist of all the cards associated with the fourth Stern-Gerlach 
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apparatus (having x = 4) and let R2 consist of all the cards associated with the 
second apparatus (having x — 2). Then we cannot expect this probability to be 
well defined since we do not know what angle the third Stern-Gerlach apparatus 
has been set at. In such cases we do not require the formalism to predict any 
value for the probability. 

To aid our considerations we will restrict our attention to a region R for 
which all probabilities 

prob(X R \F R ,C) (33) 

are well defined where C is some condition on the cards outside R. We will 
call such a region, i?, a predictively well defined region with respect to the 
conditioning C. This region can be very big (consisting of a substantial fraction 
of the cards from V) . We will assume that we are only interested in probabilities 
inside this region and, since it is always implicit, we will drop the C writing 

W ob(X R \F R ) (34) 

An example of a predictively well defined region might be provided by the data 
coming out of a quantum optical laboratory. We would have to set up the 
laboratory, the optical table, the lasers, the various optical elements and the 
computers to record data. Having set this up we would want to keep the doors 
of the laboratory shut, the shutters on the windows down, and condition on 
many other actions and observations to isolate the experiment from stray light. 
All this data could go into C . In setting up the formalism we will assume that 
C is always satisfied and that it is the purpose of a physical theory to correlate 
what goes on inside R. Having set up the formalism we will discuss ways to avoid 
having to have a predictively well defined region and having to conditionalizc 
on C (see Sec. E2 and SecEJ. 

16 States 

Now consider region R\ inside the predictively well defined region R (see Fig. 
EJa)). Then J2U can be written 

p = prob(A fll U X R _ Rl \F Rl U F R - Rl ) (35) 

We will regard (X R — Rl , F R _ Rl ) which happens in R— R\ as a generalized prepa- 
ration for region R\. Associated with this generalized preparation for R\ is a 
state which we will define shortly. We will regard (X Rl , F Rl ) which happens in 
i?i as a measurement. We will label measurements in R\ with ot\. Then we can 
write 

p ai = prob(X£ U X R - Rl \F% U F R _ Rl ) (36) 

We define 

The state for i?i associated with a generalized preparation in R—Ri 
is defined to be that thing represented by any mathematical object 
which can be used to predict p ai for all 011. 
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(a) 



(b) 



Figure 5: (a) Region Ri inside the predictably well denned region R. (b) Regions Ri 
and R2 inside predictably well denned region R. 



Note that quite deliberately we define this for the joint probabilities in l)36|l 
rather than the conditional probabilities prob(X^J | Xr-^ , U Fr-^) even 
though the latter may seem more natural. The reason for this is that intro- 
ducing conditional probabilities requires normalization by Bayes formula which 
introduces nonlincarities. It turns out that these nonlinearities would represent 
an insurmountable problem in the case where we have dynamic causal structure 
and so it is better to work with the joint probabilities. As we will see, we can 
use Bayes formula in the final step when calculating conditional probabilities so 
there is no problem. 

Given the above definition we could write the state as 



P(iZi 



Pat 
\ J 



(37) 



We can then write 



p ai =K ai (R 1 )--p(R 1 ) 



(38) 



where the vector R Ql (i?i) has a 1 in position ol\ and O's in all other positions. 

Now, we would expect a physical theory to have some structure such that 
it is not necessary to list all probabilities as in P(i?i) but rather only some 
subset of them. Thus, we pick out a set of fiducial measurements (X R * , F R *), 
where k\ G f2i, such that we can write a general probability by means of a linear 
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formula 
where 



p ai = r Ql (i?i) • p(i?i) 
/ : \ 



p(Ri) = 

now represents the state with 



Pki 

V '■■ / 



where k\ £ Qi 



Pkl = P rob(X^ U X B _ fll |F£ U F fl _ Bi ; 



(39) 



(40) 



(41) 



and where K\ = |f2i| is taken to be the minimum number of probabilities in 
such a list. It is clear we can always do this since, as a last resort, we have 
Q38[). Omega sets such as fii will play a big role in this paper. The are not, in 
general, unique but we can always pick one and stick with it. 

The vector r Ql (i?i) is associated with the measurement (X^ 1 ,^ 1 ) in R\. 
The fiducial measurements are represented by 



/ \ 




for all k\ £ fii 



(42) 



V J 

where the 1 is in the k\ position since this is the only way to ensure that 
rfci ' P = Pfei a s required. 
We define A** by 

r ai = E A «x r ^ ( 43 ) 

We will define further lambda matrices in the next section. Like omega sets, 
they will play a central role in this work. They give a quantitative handle on 
the amount of compression the physical theory provides. Given (I42|l it is clear 
that here the lambda matrices are just the components of the vector r Ql . That 
is 



r ai \ki 



(44) 



We will sometimes drop the a's writing r with components r kl . The a's are 
then understood to be implicit. Note it also follows from the definition of the 
A matrix that 



Kt)=8h for k[,hefii 



(45) 



where 6, j equals 1 if the subscript is equal to the superscript and otherwise. 
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17 Composite Regions 



Now consider two disjoint regions Ri and i? 2 in R (see Fig. EJb)) . We have 

prob(X Rl U X R2 U X R - Ri -r 2 \F Ri U Fr 2 U F R - Ri -r 2 ) 

= viR^-piRt) 

r kl (Ri)p kl (Ri) 



r fcl (^i)r(E 2 ) -p fel (i?2 



fci 

E 

fci 

E r ki r k 2 Vk 1 k 2 
k\k 2 



(46) 



where p/ Cl (i?2) is the state in R2 given the generalized preparation (X^ U 



R-R ± -R 2 



F 

1 -*■ 1 



fcl 



Pfcife 2 



U Fr-r^-r^) in region R — R2 and where 

prob(X^ U X* 2 U Xfl- fll -i? 2 1^ U F* 2 U F fl _ Bl - fl2 ) 



(47) 



This means that we can write the probability for any measurement for the 
composite region R\ U R2 in terms of a linear sum of joint probabilities p kl k 2 
with fcifc 2 G f2i x Sl 2 . It may even be the case that do not need all of these 
probabilities. There may be some further compression (though still maintaining 
that we have a linear sum). Hence we have the result that a fiducial set of 
measurements for the composite region R\ U R2 is given by 

(48) 

This result is the central to the whole approach in this paper. 

If the behaviour in the two regions is not causally connected then we expect 
that H12 = Oi x f2 2 . On the other hand, if there is a strong causal connection 
then we can have |f2i 2 | = |^i| = The relationships between these sets 

gives us a combinatoric handle on the causal structure. We seek, however, a 
more quantitative handle. 



(X k R \ U X*,Ffc U F%) for k x k 2 G fi 12 C ^ x Q 2 



To this end we define A 



k\k 2 
hh 



by 
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Af^ 2 r felfe2 for hh G!JixJJ 2 



(49) 



We will adopt the convention of labelling the elements of the post compression 
omega set with k's and the elements of the pre-compression product set with Z's 
as in this equation. As before, the fiducial measurements are represented by 

/ \ 





r k ± k 2 



for all fcifc 2 € Oi 2 



(50) 



V J 



2N 



where the 1 is in the fci/c2 position. It follows by taking the fcifo component of 
(EU that 

r llh \ klk2 =rl%=Af£ (51) 

This is similar to (|44|l above. 

We can use lambda matrices to calculate an arbitrary r for the composite 
system. To see this we start by putting (I44|) in 146|) and reinserting a's 

r aia2 -p = E A *AtPhh (52) 

ZiZ 2 £f2i xfi 2 

]T A^a^hh ■ P (53) 
Zi2 2 efii xn 2 

= E A ^ A ^ E a ^ 2 *wp ( 54 ) 

ZiZ 2 efiixn 2 fei/c 2 eni 2 
Since this must be true for all p we have, 

r Q1 * 2 = E ( E ttAKt ) r ^ ( 55 ) 

feife 2 efii 2 \iii 2 eiiixn 2 / 
and hence, in view of i|5U|) above, the components of r QlQ2 are 

W.kfa = E A ai A a 2 A fiit 2 (56) 
ZiZ 2 efii xsi 2 

This is consistent with <|51[1 above given (|45|) . 

We can generalize this for N regions. We define lambda matrices for multi- 
region composites by 

r h ... lN = K'.'.'.fN^i-M for h ■ ■ -k e fix x • • • x Ojv (57) 

k\ ...fciv€ni2 

And then it is easy to show that 

v ai ... aN \ kl ... kN = J2 A ^ A aV'- A L™A^, fc ; (58) 

Zi ■■■ljf£Ri x ■■■ x 

Hence, given the lambda matrices we have a way of calculating the components 
of r vectors for one region l|44|) , two regions (|56|l , and multi- regions l|58|) . 



18 The Causaloid 

In the previous section we discussed composite regions made from regions R\, 
i?2 , • • ■ ■ The smallest component regions are the elementary regions R x . Any 
region, Rq, is composed of elementary regions. A general measurement in this 
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region is labelled with ao- But since a general measurement decomposes into 
local measurements at each component elementary region we can write 

olq = a x a x i • • • a x " where O = {x, x , ■ ■ ■ x' } (59) 

For each of these elementary regions we will have a local lambda matrix A** [x, O x ) 
with k x £ il x . We include the argument x for clarity and the argument fl x since 
the choice of omega set is not, in general, unique. For the region Ro we have 
lambda matrix 

A^(0,fl ) (60) 

where 

lo = l x l x > ■ ■ ■ l x » <E £l x x Q x > x ■ ■ • x Q x » where O = {x, x' x"} (61) 

and 

ko = k x k x i ■ ■ ■ k x " G £lo where O = {x, x , • • • x"} (62) 

We will sometimes have reason to consider a lambda matrix as in l|60|l but where 
O = {x}. That is Ap. By convention Vs are in the product set and fc's are in 
the new omega set. But in this case there is only one omega set, namely £l x to 
take a product over. Thus, we have 

Afj=<j£ a with l x ,k x efl x (63) 

It is worth noting that if we know A*°(0, CIq) for one omega set then, since 
the lambda matrix contains all relevant linear dependencies, we can calculate 
(i) all other omega sets and (ii) the lambda matrix for any other omega set for 
the given region (we will spell out the method for doing this in Sec. I24|) . Hence, 
it is enough to know the lambda matrix for one omega set for each region. The 
lambda matrices can be used to calculate an arbitrary measurement vector using 
the results of the previous sections applied to elementary regions. From l|44|l 
and (J5SJ 

r«. k = r QO \ ko = £ A£A&, • • • A^'„ A*° (64) 

lo 

We now come to the central mathematical object in the approach to be taken 
in this paper. 

The causaloid for a predictively well defined region R made up 
of elementary regions R x is defined to be that thing represented by 
any mathematical object which can be used to obtain r ao (Ro) for 
all measurements ao in region Ro for all Ro C R. 

In view of the above results, one mathematical object which specifies the causa- 
loid is 

A^ (x, fi x ) : for one £l x for each R x 

(O, fie> ) : for one flo for each non-elementary Ro C R 



(65) 



30 



This lists all A matrices. However, we might expect any given physical theory to 
have some structure such that some A matrices can be calculated from others. 
If this is the case then we might expect that we can take some subset of the A 
matrices, labelled by j and write 

A = [A(j) : j = 1 to J|RULES] (66) 

where RULES are rules for deducing a general A from the given A's. We will see 
that we can achieve quite considerable compression of this nature in the cases 
of CProbT and QT. 



19 The causaloid product 

As we noted in Sec. ^2 there are three basic ways of putting two operators 
together in quantum theory: AB, A1B, and A®B. We noted there that it would 
be desirable to treat these on an equal footing. To this end we now define the 
causaloid product for our framework. At this stage we are working in a general 
framework. However, we will see later that this unifies all these types of product 
for quantum theory. Let r Ql be a measurement vector in R\ (corresponding to 
0%) and let r Q2 be a measurement vector in R 2 (corresponding to 2 ) and let 
the regions R\ and R2 be non-overlapping. We define the causaloid product <x> A 

by 

Strictly we should write 

(r QlQ2 , O x U <D 2 ) = (r ai , Ox) <g> A (r Q2 , <D 2 ) (68) 

since the causaloid product needs to know which region it is addressing but for 
brevity we will stick with l|67|l the regions being implicit in the labels ot\ and a 2 . 
The components of the LHS are obtained from the components of the vectors 
on the RHS by applying 101} and JSHJl 

r aia2 \k lk2 = ( r ^J(r Q2 |; 2 )A^ 2 (69) 

ZiZ 2 efii xsi 2 

Now, the lambda matrix A^ 2 is given by the causaloid. To see this we can 
reinsert the O's writing it as A^ 1 ^ 02 which, in view of i|61U62l) . is the same as 

A? 0lU ° 2 . Now, in fact the causaloid gives this lambda matrix for the Vs in the 
product set over all elementary product sets fix's (as in whereas we only 

require I's in the subset fie^ x fiev We use only those components of A^ 1 ^ 2 
we need to take the causaloid product. 
We wish to discuss two special cases. 

1. Omega sets multiply. Q12 = fii x ^2 so that |fii2| = jfiiHf^l- 

2. Omega sets do not multiply. Q12 C fii x Q 2 so that |fii2| < |fii||fi 2 |. 
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First note that it follows immediately from the definition of the lambda matrix 
in g5J) that 

Kt 2 = s ht 2 for l ^kik 2 e n 12 (70) 

where S t \ 2 2 equals 1 if the subscripts and superscripts are equal and otherwise. 
Hence, we can write (|69|l as 

r aia2 \k lk2 = (r ai |* 1 )(r« a | fc9 )+ Yl (^J/J^aJ/JAf^ 2 (71) 
It follows that 

if Q12 =Cli X Q.2 then r ai „ 2 = r Ql (g) r a2 (72) 

where £g> denotes the ordinary tensor product. Hence we see that the ordinary 
tensor product is a special case of the causaloid product when the omega sets 
multiply. We will see that, in quantum theory, typically omega sets will multiply. 
This corresponds to the products A7B and A® B from quantum theory which 
have the property that the total number of real parameters after taking the 
product is equal to the product of the number from each operator (so we have 
1^12 1 = |^i 1 1^2 1). Omega sets do not multiply when we have strong causal 
dependence so that what happens in one region depends, at least partially, on 
what is done in the other region in a way that cannot be altered by what is done 
in the rest of R. In quantum theory we see this when we take the product AB. 
Then the total number of real parameters in the product is equal to the number 
in A and B separately (this is basically \ili 2 \ = = |^2|)- Typically strong 
causal dependence indicates that two regions are sufficiently "close" that what 
is done outside of these regions cannot interfere with the causal dependence. 
We will say that the two regions are causally adjacent in these cases. 



20 Using the causaloid to make predictions 

The causaloid is so named because it is an object which gives us a quantitative 
handle on the causal structure as was seen in the previous section. What is 
surprising is that, given the causaloid, we can calculate any probability pertain- 
ing to the predictively well defined region R so long as that probability is well 
defined. To see this note that if we have disjoint regions R\ and R 2 we can 
write 



prob(X fll \X R2 , F Rl ,F R2 ) = — -j- (73) 

l^Y Rl r {Y Rl ,F Rl ) ® A r (x f • p(Ri U R 2 ) 



(this is basically Bayes formula). The sum in the denominator is over all possible 
stacks Yrj in Ri consistent with Fr^ , that is all C Fr^ . All probabilities 
pertaining to region R are of this form. Now, if this probability is well defined 
then it does not depend on what is outside R\ U R 2 - Hence, it does not depend 
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on the state p(i?i U i?2)- The space of possible states spans the full dimension- 
ality of the vector space by definition since we have a minimal fiducial set of 
measurements specifying the components of the state. Hence 

The probability 

pmb(X Rl \X R2 ,F Rl ,F R2 ) 
is well defined if and only if 

is parallel to 

Y Rl QF Rl 

and this probability is given by 

W ob(X Rl \X R2 , F Rl , F R2 ) = M (74) 

where |a| denotes the length of the vector a. 

In fact, since the two vectors are parallel, we can simply take the ratio of any 
given component of the two vectors as long as the denominator is nonzero. We 
can write v = pu where p is the above probability. 

One concern might be that it will be a rare state of affairs that these vec- 
tors are parallel and so the formalism is rarely useful. Since we have only set 
ourselves the task of calculating probabilities when they are well defined we are 
not compelled to address this. However, it turns out that the situation is not 
so bad. In fact we can always make R2 big enough that we have well defined 
probabilities. To see this consider the extreme case where R2 = R — R\. Then 
we have p(i?i U R2) = p(R)- Now, we only have one preparation for the pre- 
dictively well defined region i?, namely the condition C being true on the cards 
outside R. Since we are always taking this to be true we can only have one state. 
This means that it must be specified by a single component, that is p(i?) = (pi) 
where {p\) is a single component vector. The number p\ cancels out in (|73[1 and 
so the probability is well defined. 

If the two vectors are not exactly parallel the probability will not be well 
defined but it may be be approximately well defined. Indeed, we can use <|73[1 
to place bounds on the probability. Define v" and v -1 as the components of v 
parallel and perpendicular to u respectively. Then it is easy to show that 

lylll Iv-H |vll| Iv-H 

|u| |v|cos</> |u| |v|cos0 

where is the angle between v and v 1 (we get these bounds using |v-p| < |u-p| 
and considering p parallel to v x ). 
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21 Physical theories and the causaloid formal- 
ism 



In the causaloid formalism 

1. We have the causaloid, A, which is theory specific. The causaloid depends 
on the boundary conditions C outside R. These might only be relevant 
when we are "close" to the boundary (QT appears to be of this nature). 
In this case, modulo what happens at the boundary, we can say that the 
causaloid fully characterizes the physical theory (at least its predictive 
component). 

2. We have some basic equations which are theory non-specific. These are 
(|64() for calculating a general r from the causaloid, (|69(l for forming the 
causaloid product, and Bayes formula in the form given in 1(751) above (or 
we can use 11740 given that appropriate conditions are satisfied). 

Given the theory specific part and the theory independent equations we can 
make predictions. This framework is very general and we would expect any 
physical theory to fit into it (perhaps with some minor modifications concerning 
the way the data is collected onto cards) . Hence we see that we have a potentially 
very powerful formalism in which the theory specifics are all put into one object. 
This is likely to be useful if we are hoping to combine different physical theories. 

22 The open causaloid 

In typical situations we will have some elementary regions which can be re- 
garded as being at the boundary. Typically we might expect to have to use 
special mathematical techniques to deal with these. However, if the region R is 
sufficiently big then we are most likely to be interested in probabilities which do 
not pertain to the these boundary regions. For this reason we define the open 
causaloid. 

The open causaloid for a predictively well defined region R made 
up of elementary regions R x with boundary elementary regions R Xb 
with xjj £ Ot is defined to be that thing represented by any math- 
ematical object which can be used to obtain r ao (Ro) for all mea- 
surements ao in region Rq for all Rq C R — Ro b ■ 

We can use the open causaloid to calculate all well defined probabilities exclud- 
ing those which pertain to the boundary. If we make the region R sufficiently 
big then we can be sure that any regions like R\ and R2 of interest (and for 
which we want to calculate conditional probabilities) do not overlap with the 
boundary regions. In this case the open causaloid is as useful as the causa- 
loid itself. Indeed, given that we have already restricted our attention to R in 
defining the causaloid, it does not matter much if we restrict slightly further to 
R — Ro b - In view of the remarks in the last section, the open causaloid is likely 



34 



to be characteristic of the physical theory without being especially influenced by 
boundary conditions outside R. We can, further, envisage letting the boundary 
tend to infinity so that the open causaloid and the causaloid become equivalent. 

23 Some results concerning lambda matrices 

The causaloid is either specified by giving all lambda matrices or just giving a 
few lambda matrices and then using some RULES to calculate all others. If we 
want to use such RULES then it will be useful to have some results relating 
lambda matrices. 

First we note that when omega sets multiply so do lambda matrices. 

A* 1 * a =A* l A^ if fi 12 = fi 1 xfi 2 (76) 

where the l\ might belong to any subset of the full set of allowed measurements 
(that are labelled by ai), and likewise for This follows from l|72ll using l|44|l 
and restricting to the given sets for the Vs. 
Next we give the following result. 

Kts 3 = E A tt A S 3 if fiia3 = n 12 xfi3, and n 23 = n a? xn^ (77) 

k' 2 G^2^ 

where the notation means that we form the set of all k 3 for which there 
exists /c2&3 € ^23- This generalizes as 

, , ^1234 = ^12 x ^23 X ^3/4 

k' 2 efi 2 ^,k' 3 en 3 ^ ^34 = x fi^ 

(78) 

and so on. We will now prove l|77|l (we obtain (|78|l using the same proof tech- 
nique). We have 

rhhh ■ P = r hh ■ p h (79) 
= E ^'t^-Ph (80) 

= E A^r ;ife , fe3 .p (81) 

k' 2 k3(LQ23 

= E E a^SN^-p ( 82 ) 

where pi ± is the state for region R2 U R3 given that the preparation was (X^ U 
Xr-r^-Rz-r.^ F l R \ U Fr- Ri -r 2 - R3 ) in R - R 2 - R 3 . Also note that we have 
used the same method in the last line that was used in the first three lines but 
for pfc 3 . Since this is true for any p we have 

^W 3 = E E A ?S A S£V^3 (83) 

fcl/c 2 Gfii2 k' 2 k 3 £il23 
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We also have, by definition, 

r« 2 i 3 = E A S 3r feW 3 (84) 

fclfe2fc3ef!l23 

Comparing the last two equations gives us l|77l) . 

We have found some relations between lambda matrices when these lambda 
matrices have certain properties (such as having omega sets which satisfy the 
given properties). This proves that we do not have complete freedom to choose 
lambda matrices independently of one another. It should be possible to charac- 
terize all possible relationships between lambda matrices so we know how much 
freedom we have in specifying the causaloid. These constraints are likely to give 
us deep insight into the possible nature of physical theories. 

24 Transforming lambda matrices 

We write A*°(0,n )- We noted in Sec. d that since the lambda matrix con- 
tains all relevant linear dependencies we can calculate (i) all other omega sets 
and (ii) the lambda matrix for any other omega set. For example, we might 
want to check that 0,' o is an omega set and then calculate Aj°° (0, fl' ). To do 
this is easy. First we form the square matrix 

A k k ?(0,n ) k' en' (85) 

If this has an inverse then Q'q is an omega set. Then it is easy to show that 

Af°(O,n' o ) = E[A^(O ! no)]- 1 Af o °(0,O o ) (86) 

ko 

by considering the equations by which the lambda matrices are defined. Similar 
remarks apply to local lambda matrices A*** (x,fl x ). 

Note that if 0,' o = flo then the matrix in (|85|l is equal the the identity. 
Indeed, this is how we can deduce the omega set from the lambda matrix. 

25 Introducing time 
25.1 Foliations 

It is common in physics to think of a state evolving in "time" . We will show 
how we can recover this notion in the causaloid formalism. This will be a useful 
exercise if we wish to make contact with those physical theories, such as QT, 
that take the notion of an evolving state as fundamental. We will find, however, 
that this formalism admits a much more general framework for evolving states. 
In particular there is no requirement that the time slices are space-like. 

If we wish to think of the state evolving in the region R then we must 
introduce introduce a time label t = 0, 1, . . . T and a set of nested subsets of R 

R = R(0) D R(l) D ■ ■ ■ D R(t) D R(t + 1) D ■ ■ ■ D R{T) = (87) 
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We will call this a foliation. It is a feature of the present approach that we are 
free to allocate the nested set in this foliation any way we wish - even ways that 
would not correspond to our usual notion of time. However, we expect that 
certain foliations will be well behaved - namely those that correspond to a good 
choice of t. We define an elementary time-slice 

Rt = R(t + 1) - R(t) (88) 

The elementary time-slice Rt consists of all the cards in R between times t and 
t + 1. Note the notational difference between R(t) and Rt- We use an argument 
to denote what happens after time t and a subscript to denote what happens 
between times t and t + 

25.2 States evolving in time 

We can write 

p(t) = p(R(t)) (89) 

for "the state at time t" . Given this state we can calculate probabilities for 
what happens at times after t. If we know the causaloid then can find omega 
sets fl(t) = fijj(t). The components of the state are Pk(t) with k{t) g fi(t). The 
notation k{t) is a little unnatural, kt would be more natural, but we reserve 
this for elementary time-slices. We understand the t argument on k(t) to tell us 
that these fc's are in the omega set fi(t). 

The state will evolve from time t to time t + The transformation will 
depend on what was done and what was seen in the elementary time slice R t . 
We denote this by (X t ,F t ) and have the associated vector r^ Xt ,F t ){Rt)- Since 
R(t + 1) = Rt U R(t) we can can calculate the components of p(t + 1) from p(t). 

PHt+i) =r k{t+ i){t+l)® K r { x t ,F t ){ R t)-v(t) (90) 
where k(t + 1) g 0(t + 1). Hence we can write, 

p(t + l) = Z {XtiFt) (t+l,t)p(t) (91) 

where the elements of the |Q(t + 1)| x |f2(t)| real matrix Z are given by 

z (x t ,F t )(t + l.*)fe(t+i)fe(t) = [ r fc(t+i)(*) ® A ^(X t ,F t )(Rt)]\k(t) (92) 

We can calculate these from the causaloid. We can label each (X t ,F t ) in the 
elementary time-slice Rt with a t . Thus will write the transformation matrix for 
time t to t + 1 as Z at or as Z t if we are suppressing the a's. 

Since there is only one preparation for R (namely that implicit in the con- 
dition C) we know that the state at time t — is 

p(0) = p(R) - (pi) (93) 

where (p\) is a single component vector. Hence, we can calculate a general state 
by 

p(t) = Z(t,0)p(0) (94) 
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with 



Z(t,0) = Z t Z t -f-Z 



(95) 



We see that the causaloid provides us with a notion of a state evolving and tells 
us how to evolve it. The only quantity left undetermined by the causaloid is 
the component p± . But this will cancel when we use Bayes formula to calculate 
conditional probabilities in R and so need not be determined. 

25.3 Obtaining lambda matrices from transformation ma- 
trices 

We can write (X R , F r ) in R as the union of (X t , F t ) in R t over t = to T — 1 
or denote it with the collection of a labels ax-i ■ ■ ■ ao- Then 

probpfR, F R ) = prob(a T -i ■ • • a ) = r aT _ 1 Z aT _ 2 ■ ■ ■ Z ag ■ p(0) (96) 

For notational simplicity we perform the following replacements 

and Z ao • p(0) -> Z ao (97) 

Hence we have (suppressing most a's) 

prob(X t Qt U X R _ Rt , F t at U F R _ Rt )= Pat = Zt-iZ T -2 • ■ ■ Z at ■ ■ ■ Z (98) 

Corresponding to the elementary region R t is a state pt with a generalized 
preparation (X R ^ Rt , F R - Rt ) in the region R — R t . Note that the generalized 
preparation contains a part to the past of t and a part to the future of t + 1 
(at this level the framework is similar to the time-symmetric formulation of 
Aharonov, Bergmann and Lebowitz ^3]). The state pt has components 

Pk t = Z T -i ■ ■ ■ Z kt ■ ■ ■ Z (99) 

We can write an arbitrary Z at time t in terms of the linearly independent set 
Z kt with k t € fit- That is 

Z ^ = E r Z z ^ ( 10 °) 

Putting @ and ltTUU|) into ® gives 

Pc« t = r Qt • Pt (101) 

as required. This justifies the use of r kt in (|100|) and it justifies labelling the 
linearly independent set of Zj~ t 's with kt £ fit- This means that we can write 

Z at - £ A^Z te (102) 
fc t en t 

using l|H|l. 
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Now consider adjacent elementary time-slices Rt and Rt+i- We have 

Pi t+1 i t = ■ ■ ■ Z h+1 Zi t ■ ■ ■ Z a for k+xk G fi t+ i X O t (103) 

There will be a state pt+i,t for the composite region made from R t and Rt+i 
associated with a generalized preparation in the region R — Rt+i — Rt- This 
state has components 

Pkt+ikt = Z T -\Z T -2 ■ ■ ■ Z kt+l Z kt ■ ■■ Z Q with k t+1 k t G flt+M (104) 

We can write 

Zi t+1 Z k = ^ kt Z kt+l Z kt (105) 

kt+ikt Gf2t+i,t 

Putting (Pljl and ifTUS) into (|rU3|) gives 

Pi t+1 i t = r h+lh ■ p t+M (106) 

Hence we can write 

^ t+1 ^= E A^Z fct+1 Z fct (107) 

fet+ifct6f2t+i,t 

using i|5T)l. 

This method will also work for more than two sequential elementary time- 
slices. In general 

Zi t+T . . . z h+1 z h = ^ A^+;;;;^J t fet z fet+T . . . z kt+1 z kt (108) 

fe+iit£f!i+i,t 

for t sequential regions i? t to Rt+ T - 

If a theory provides transformation matrices Z's we can use (|102|) and (|108H 
to obtain lambda matrices Ajl* and yY' Ct + T - fct +i' £t However, this is not sufficient 
to fully specify the causaloid (or even the open causaloid) since (i) it does not 
tell us how to calculate lambda matrices for a non-sequential set of elementary 
time-slices and (ii) elementary time-slices may contain many elementary regions 
R x and we do not know how to obtain lambda matrices between these. Up till 
now everything we have done has been quite general. In particular, all this 
works for any choice of nested regions R{t) or, equivalently, for any choice of 
disjoint elementary time-slices Rt- To deal with (ii) we need to add spatial 
structure which we will deal with later. In the mean time we would like to make 
progress on (i) and to that end we will introduce some extra assumptions which 
are true in QT and CProbT for the natural choice of time slicing. 

25.4 Some assumptions 

We make two assumptions which happen to be true in CProbT and QT. 

Assumption 1: We assume \Cl(t)\ = K = constant for all t except the 
end points t = and t — T where we must have |f2(0)| = = 1. 
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Assumption 2: We assume that |f2 t | = |f2(t)||f2(t + 1)| so that we 
have the maximum possible number of linearly independent matrices 
z k t - 

The first assumption follows from symmetry under time translations (except at 
the end points). Both assumptions taken together imply that |fit| = K 2 for 
t = 1 to T — 1 and |f2o,r-i| = K f° r the first and last time-slice. Consequently 
the non-end point transformation matrices are K x K (that is square matrices) 
and the matrices at the end points are 1 x K for Rt-i and K x 1 for Rq (that 
is they are row and column vectors respectively). 

Now consider two non-sequential time-slices Rt and Rf with t' > t + 1. We 
have 

p lt , h =Z T -i---Zi t ,Z(t'-l,t + l)Z k ---Z for W t 6flfXfl t (109) 

where Z{t' — 1, t + 1) is the transformation from t + 1 to t' — 1. We can define 
the linear operator \Z\ ,7Zi t ] by 

[Z h ,lZ h ]Z(t' -l,t+ 1) = z h ,z{t'- l,t + l)Z h (110) 

It can be shown that it follows from assumptions 1 and 2 that 

[Zi t ,?Zi t ] for l t 'l t G f2f x fl t are linearly independent (HI) 

Hence, 

CVt = IV x ^ (112) 

where fift is the omega set for region R t i U Rt- When omega sets multiply then 
so do lambda matrices (see (J7SJ)). Hence, 

Af;/f =a£'a£ (us) 

where f' > t + 1. This result generalizes in two respects. First it works if we 
replace the elementary time slices by clumps of any number of sequential time 
slices. For example 

A fc t ,ktfc t _i = A fct' A M*-i ( U4 ) 

l t iitt>t-l t t f Lth-1 v ' 

where t' > t + 1. Second, it works for more than two non-sequential clumps. 
For example, 

A k t „ k t „k,k t k t ^ ^V+iV.t^W-! (U5) 

H"+iH"H' HH-i l t"+i't" H' hh-i y ' 

where i" — 1 > t' > i + 1 (so we have non-sequential clumps). This second 
generalization requires proving a generalization of (|lll|l . 

We can now summarize as follows. If we have a physical theory for which 
there exists a choice of nested subsets R(t) such that the above two assumptions 
are satisfied and if this theory provides us with transformation matrices Z then 
we can use the above results to calculate lambda matrices for all unions of 
elementary time-slices. Given an arbitrary such union of elementary time-slices 
we proceed as follows. First we use equation (|108|) to obtain, from the Z's, a 
lambda matrix for each clump of sequential time-slices. We then multiply these 
lambda matrices to get the desired lambda matrix. We can also obtain local 
lambda matrices, A^* , for each elementary time-slice using (|102fl . 
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25.5 Calculating lambda matrices from more basic lambda 
matrices 

In the previous subsection we need to use Z's to calculate lambda matrices. We 
would like to leave the Z's behind since, from our point of view, they belong to 
a less fundamental way of thinking about this structure. In this subsection we 
will show how we can get lambda matrices for arbitrary unions of elementary 
time-slices (excluding the first and the last elementary time-slice) starting only 
with local lambda matrices, A*: , for each R t and lambda matrices, A, t+1 , ' for 
pairs of sequential Rt- But to do this we need to add one more assumption. 

Assumption 3: We assume that one at least one of the allowed trans- 
formation matrices, Z t , for each elementary time-slice (except for the 
first and the last), R t , is invertible so Z^ 1 exists. 

Note, we do not require that Z^ 1 is in the set of allowed transformation matrices. 
In R t we have the fiducial matrices Zk t with kt € fit- We let 

fit = (1,2,3,..., L) for 0<<<T-1 (116) 

where we have L = K 2 . Employing the above assumption, we can, without 
loss of generality, choose the first fiducial matrix, Z\ , to be invertible for each 
elementary time-slice except the first and last. Now consider a clump of sequen- 
tial elementary time-slices from R t to Rt>-i (with t > and t' < T) where we 
implement Z\ for each elementary time-slice except the first in the clump where 
we implement Z^ t . The corresponding matrices 

Z x Z x ...Z x Z kt with fc t Gfif (117) 

form a linearly independent set in terms of which we can expand general trans- 
formations Z(t', t) from t > to t' < T. Hence, we can say that a fiducial set is 
given by 

fit, t+ i,...,t'-i = (HI • • ■ 1, 211-..1,-.. , L11-..1) (118) 

It is very simple to verify that these omega sets satisfy the conditions given for 
H77|) and its generalizations (such as 1)78(11 to hold. Hence, using this method we 
can calculate the lambda matrix for an arbitrary sequential clump using just the 
lambda matrices for pairwise sequential regions (so long as we exclude the first 
and the last elementary time-slices) . We can then apply the methods of the last 
section to get the lambda matrix for a completely arbitrary set of elementary 
time-slices (though still excluding the first and last elementary time-slice) . 

25.6 A basic causaloid diagram 

Assume that the elementary time-slices are, in fact, elementary regions. Then 
we have developed sufficient machinery to calculate the open causaloid (where 
the first and last elementary time-slices are regarded as being in the boundary 
region). We can summarize this with a diagram, which we will call a causaloid 
diagram. This diagram is shown in Fig. |(J a ) consisting of nodes, links, and a 
sisterline whose significance is the following. 
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1. Nodes correspond to elementary regions R t and are dressed with A^* . 
From this lambda matrix we can deduce the omega set = (1, 2, • • • , L) 
associated with the node. 

2. Links join sequential regions and are dressed with A;f'+ 1 . Fr om this 
lambda matrix we can deduce the omega set £lt,t+i = (11,21, • • - LI) as- 
sociated with the link. 

3. The sisterline (the thin line running along the side) denotes the set of 
omega sets Cl t ,t+i,— ,t+r = (m ' ' ' 1, 211 • • • 1, • • • , Lll ■ ■ • 1). This line 
is to the right as we go up. This corresponds to the direction implicit in 
this set of omega sets. 

We exclude the first and last elementary time-slices from this diagram. Using the 
causaloid diagram we can determines the open causaloid since we can calculate 
any lambda matrix (except those pertaining to the first and last elementary 
time-slices). A lambda matrix for arbitrary O can be calculated using the 
clumping method obtained above. We first identify all clumps of sequential t's 
in O. Then we use l|77[) and its generalizations (such as iJZHJ) 1° calculate lambda 
matrices for each sequential clump. Then we multiply these lambda matrices to 
get the lambda matrix for O. For clumps consisting of a single member we use 
<|63|) before multiplying the lambda matrices. 

If we are starting with a theory which is expressed in terms of transformation 
matrices (we regard such a theory as less fundamental) then we can calculate the 
lambda matrices for nodes and links using i|l(J2|) and Qll)7fl , We will show how 
to do this for CProbT and QT in a later section. Once we have these lambda 
matrices we can disregard the transformation matrix formalism and work with 
the causaloid formalism instead (as long as Assumptions 1,2, and 3 are true). 

26 Adding spatial structure 

To add spatial structure we will use the notion of interacting systems. We 
will label systems with i = 1,2,.... We can regard the situation depicted 
in Fig. EJa) as corresponding to a single system. Now consider the causaloid 
diagram shown in Fig. HJb). This depicts what we can regard as two systems 
interacting. We label these systems i and j. These labels become attached to 
the corresponding sister lines. We have two types of node. Nodes at crossing 
points (there is only one such node in Fig. HJb)) and nodes at non-crossing 
points. We can think of crossing points as having two systems present which 
may (depending on what local procedure is carried out) be interacting. More 
complicated situations involving several interacting systems are shown in the 
causaloid diagrams in Fig. [|J{c,d). For simplicity we will restrict the maximum 
number of systems in any given elementary region to two so we never have more 
than two systems crossing through a node. The methods we will present can 
quite easily be generalized to situations in which we relax this constraint. 



43 



The nodes are labelled by x which we think of as a space-time label. For 
each system we have a sequence of x's (those picked out by the corresponding 
sister line). Our intention is to find a way to go from theories (like CProT and 
QT) which have transformation matrices to the causaloid formalism. Thus, for 
each system we have a sequence of matrices Z l x and local omega sets ft],. We 
now introduce the following assumption 

Assumption ^: The matrices Z ki ,j = Z k i ®Z k i with k x k x € Q x x Q J X 
form a complete fiducial set at any crossing node, x. Here ® is 
defined in the usual way (so that (A B) (C ® D) = AC <g> BD). 

It follows from Assumption 4 that we can write a general transformation matrix 
at a crossing node as 

Z*.= E ^l ki Z K ®Z ki (119) 

/c Jl /c £ ^. x^2^ 

Thus, at the crossing node x we have 

n x = n* x fiJ (120) 

This equation implies that |n x | = |f2*||fi|,|. We can interpret this to mean 
that when two systems are put together the number of properties is simply the 
product of the number of properties of each system - we do not lose or gain 
properties. Equation (|119fl tells about how systems interact. If we can write 

A KK =A K A K ( 121 ) 

then we can write all transformation matrices at x as Z a i ®Z a j and consequently 
the two systems actually cannot interact. It would further follow that we can 
always write 

{X Rx , F Rx ) = {X R% U X Ri , F Ri U F Ri ) (122) 

at R x and, hence, that we can regard anything we might do in R x as being 
composed of separate actions on the two systems. In this case we might as well 
regard x, i and x, j as corresponding to separate elementary regions - there is 
no reason to have a crossing node. Interaction at R x requires that A a ^ " does 
not factorize (i.e. this is required for something interesting to be going on at 
the crossing node x) . In particular, it means that there are some (X Rsc , F Ra . ) 
at R x which cannot be regarded as being composed of separate actions on the 
two systems. It is interesting that we have a form of interaction between two 
systems here even though the omega sets multiply simply because the local 
lambda matrix does not factorize. 

If local omega sets multiply for two systems at a crossing node then it is 
reasonable that omega sets for larger regions for two systems will multiply. 
Thus, we assume 

Assumption 5: Omega sets and U, 3 multiply where i ^ j and 
0i and 02 may overlap. 
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With this final assumption in place we will be able to calculate the open causa- 
loicl. First let us summarize. 

1. Each non-crossing node is dressed with A a ". The i label is that of the 
sisterline passing by this node. From the lambda matrix we can deduce 
the omega set fij. = (1, 2, ■ • • , Li) associated with the node. 

2. Each crossing node is dressed with x . The i and j labels are those 
of the sister lines crossing by this node. From the lambda matrix we can 
deduce the omega set Sl x = Sl x x Q? x associated with this node. 

3. Each link is dressed with A.*.* . The i is that of the sister line running 

along side this link. The x and x' are those of the nodes at either end 
of the link. From this lambda matrix we can deduce the omega set Q x x , 
associated with this link. 

4. Each sister line is associated with a system and has a label i. Associated 
with the sister line is a set of omega sets 

= (1H---1, 211---1,--- , /., 1 1 • • • ! i 

for system i here x, x', . . . , x" are sequential nodes along the line which 
have the line running to the right as we go along the sequence. 

Basically we have a lambda matrix for each node and each link along with some 
rules about how omega sets for composite regions are formed. From these we can 
calculate the lambda matrix for an arbitrary composite region Rq as follows. 

1 . Identify all clumps of nodes in O which are sequential along a given sis- 
ter line i. For each of these clumps apply the procedure outlined in the 
previous section - namely applying l|77|) and its generalizations (such as 
(J7SJ) to obtain lambda matrices for system i for these clumps. For clumps 

consisting of a single node we have A," = S^f as in 1631) 

2. Repeat this for each sister line. 

3. Now multiply all the lambda matrices calculated in steps 1 and 2 to get 
the lambda matrix for Rq- 

The local lambda matrices for the elementary regions are already given and 
hence we can calculate all lambda matrices. This means that the causaloid 
diagram dressed with lambda matrices in the manner described is a way of 
representing the open causaloid. To determine the open causaloid we need 
only specify a small subset of all the lambda matrices. We then have the above 
RULES for calculating other lambda matrices. We call these RULES the clump- 
ing method. 

In physical theories it is typical that we have symmetry such that the lambda 
matrix associated with equivalent objects (non-crossing nodes, crossing nodes, or 
links) in the causaloid diagram could be identical. In this case we can represent 
the open causaloid by just three lambda matrices and the appropriate causaloid 
diagram. 
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Figure 7: It is possible that the same causaloid can be represented by two different 
causaloid diagrams. A possible example is shown in this figure. 



27 Time, space, and systems 

In the previous section we employed a picture of systems inhabiting space evolv- 
ing in time and interacting when they meet at the same space-time location. 
This picture underpins much theoretical thought in physics. However, from 
the point of view of the causaloid formalism this picture need not be regarded 
as fundamental. Rather, it provides an organizing principle which is useful to 
calculate the causaloid. If we start with a suitable causaloid then we may be 
able to work backwards and derive this picture. If we regard the causaloid as 
fundamental then we should contend with the possibility that this picture is 
derived. Further, it may turn out that this picture is of limited or no use in cal- 
culating some causaloids (and maybe the causaloid for QG is such an example). 
We should therefore be wary of attempting to derive physical theories from this 
picture. 

Already in the above examples the causaloid can have properties which 
weaken the notion of system. In particular, we note that it it possible to rep- 
resent the same causaloid by different causaloid diagrams. We give a possible 
example in Fig. [7| 

28 The causaloid for classical probability theory 

We will consider a number of interacting classical bits. An example of a bit 
would be a ball which can be in box 1 or box 2. 

First we consider a single bit. This has a state given by 




(123) 
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(see Sec. EJ. We represent this system by a sequence of nodes labelled t as in 
the causaloid diagram shown in Fig.|Bfa). We can evolve the state by acting on 
it with a sequence of transformations Z at . These transformation matrices must 
satisfy the properties outlined in Sec. To see what this entails first we can 
write 

Z at = ( % % ) (124) 

\ z 21 z 22 / 

We can interpret z^ n as the probability that the ball jumps to box m and 
outcome at happens given that the ball is in box n. Hence, 

0<z% n and ]TC„<1 (125) 

m 

For each value of the label at we will have a different realization of Z consistent 
with these constraints. This space of allowed Z's is continuous and hence the 
set of labels at will be infinite. However, at is supposed to label actual data 
that may be recorded on a card and so must actually belong to a finite set. 
Thus, in practice we will only include a finite set of possible 2Ts in the set we 
can actually implement. Typically recording a t will involve reading numbers 
off scales to some accuracy and the finite resolution in doing this will lead to a 
finite set. 

The following allowed Z's can serve as fiducial matrices 




Note that Z\ is invertible satisfying Assumption 3. In words, Z\ is the identity 
transformation and leaves the system unchanged. Z2 is the transformation 
associated with a measurement to see if the ball is in box 1 which leaves the 
ball in box 1 afterwards. This means that in a backwards time direction it also 
measures to see if the ball is in box 1. Z3 is the transformation associated with 
a measurement to see if the ball is in box 1 and which flips it into box 2 if it is. 
In the backwards time direction it looks to see if the ball is in box 2 and flips it 
into box 1. Z4 is similar to Z3 with the box labels interchanged. 

Something interesting has happened here. We have a classical bit and so 
the number of reliably distinguishable states is N = 2. However, since we are 
in a setting where the generalized preparation of region R t is both to the past 
and the future, we have K t = \Clt\ = N 2 . In the quantum case we will have 
K t = \£lt \ = N 4 . The point is that in this generalized preparation setting it is 
the transformation matrices Z that map linearly to the r vectors. 

We can write 

4 

Z at = £ Ai\Z kt (127) 

k t =l 

This equation can be solved to give the components of the local lambda matrix 
A*' in terms of the matrix elements of Z at . 

A 1 — 7 at A 2 — 7 a * - 7 at A 3 — z a * A 4 — 7 at f128l 
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The real importance of these expressions is given by the fact that we know the 
space of allowed Z at (given in j!25|l ). Given this we can now calculate the space 
of allowed A% . The purpose of the at label is then simply to label each element 
in this space. 

Now consider two sequential times t and t + 1. An omega set is given by 

fit+i.t = (11, 12, 13, 14) (129) 

We have 

Z v Z l= ^utZk>Z k (130) 

k f k£Qt+i,t 

We can solve these equations explicitly to obtain A^ ; fe . Omitting the trivial 
elements (Af,', fc = Sflf for I'l, k'k e (11, 12, 13, 14)) we have 

A VI 21 22 23 24 31 32 33 34 41 42 43 44 
k'k 

11 000000010000 

12 1000000 -1 0000 

13 000011000000 

14 000100001000 



(131) 



We have now calculated the lambda matrices for nodes and links and hence we 
have enough to specify the open causaloid. 

Now consider having a number of bits interacting according to one of the 
causaloid diagrams in Fig. HJb-d). We switch to labelling nodes by x and label 
different systems by different i. We have already calculated lambda matrices 
for non-crossing nodes and links. To calculate the lambda matrix for a crossing 
node we have to solve 

Z ax = J2 \ k £*Z ki ®Z ki (132) 

The constraints on the space of Z Qx (namely that elements are positive and the 
sum of each column is no greater than 1) induces a constraint on the space of 

k^ 

A a ^ x and the point of a x is simply to label elements in this space. 

We have explicitly calculated the lambda matrix for a non-crossing node and 
for a link and shown how to calculate it for a crossing-node. Given these three 
lambda matrices and the causaloid diagram we have fully specified the open 
causaloid and therefore provided a complete predictive framework for CProbT. 



29 The causaloid for quantum theory 

We could proceed in an exactly analogous way in QT as we did in CProbT. 
Thus, we could take a number of interacting qubits. Of course, we don't have 
to restrict ourselves to qubits. We could have systems whose Hilbert space 
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dimension is different from 2 and we could have systems of various Hilbert 
space dimension. 

Thus, first let us consider a single system with associated Hilbert space of 
dimension N. We have a causaloid diagram as shown in Fig.UJa). If we follow 
exactly the technique above then we would use Z matrices for quantum theory. 
We saw in Sec.|H]how QT can be formulated with Z matrices. However, we can 
instead proceed with superoperators, $, which are more familiar. In fact there 
is an invertible linear map between superoperators and Z matrices (I25f) so we 
can switch between the two objects at any time. First we choose a fiducial set 
of linearly independent superoperators for each t 

% kt for kt e fl t = (1,2,..., TV 4 ) (133) 

Then, since there is a linear map between Z's and $'s, we can write 

$«= A «A ( 134 ) 

fct€f2t 

instead i|l(J2|) . Similarly we can write 

$Jt+i$J* = ^2 ^h+ih$k t+1 §k t (135) 

kt+ikt£Qt+i t t 

instead of (|107fl where 

n t+ i, t = (11, 12,13,..., IN 4 ) (136) 

We can solve (|134f) to find the space of A^* from the known space of the su- 
peroperators. Then the a t label is used to label each point in this space (or at 
least a large set of points consistent with the resolution of the experiment). We 
can also solve (|135fl to get the lambda matrix for pairs of sequential time-slices 
(which we are taking to be elementary regions). This matrix will be \Q t X 
by |flt+i,t|. That is it will be N 8 x iV 4 . In the case of a qubit this is 256 x 16. 
This is a rather big object (though not too big). This size can be thought of as 
the price we pay for working in a framework (the causaloid formalism) capable 
of expressing any physical theory (at least any physical theory which correlates 
data as described in Sec. I15[) . We now know the lambda matrices for nodes 
and links and so have specified the open causaloid for this quantum system of 
dimension N. 

We can now consider many quantum systems interacting as shown in Fig. 
Efb-d). The ith such system has Hilbert space dimension iV». We switch to 
labelling nodes with it's. The above calculations provide us with the lambda 
matrices for non-crossing nodes and for links. We only need to find the lambda 
matrix for crossing nodes. This is given by solving 

kikieni^ni 
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which is obtained from (|119(l . On solving this we obtain the space of A Q X x from 
the known space of $'s for the two systems. We label points in this space with 
a x (up to the resolution of the experiment). Given this lambda matrix we now 
have the open causaloid and so can leave the usual quantum formalism behind. 

30 The causaloid without boundary conditions 

The causaloid is defined for a predictively well defined region R with a condition 
C on the cards outside R. The open causaloid is defined to exclude cards in 
the boundary region Ro b - The idea is that condition C is only relevant to this 
boundary region - conditional probabilities the remainder of R are unaffected 
by C . If we are restricting our attention to R — Ro b then we might ask why 
we had condition C in the first place. If we are not interested in the cards 
that go into verifying C why even collect these cards? Put another way, can we 
simply identify R — Ro b with the full pack V? If we retrace our steps we can see 
that the reason we introduced C was so we could have conditional probabilities 
of the form Pvob(Xn, Fr\C). Completely unconditional probabilities make no 
sense. However, although we use probabilities which are only conditioned on C 
as intermediate steps in our construction of the open causaloid, when we use 
the open causaloid to calculate probabilities we are calculating the probability 
of something in R — Ro b conditionalized on something else that happened in 
R—Ro b - The conditioning on C is also implicit, but if we accept that conditional 
probabilities in R — Ro b are independent of C, then this conditioning on C 
is actually redundant. This motivates us to now define the causaloid in the 
following way. 

The causaloid for the full pack V made up of elementary regions 
R x is, if it exists, defined to be that thing represented by any math- 
ematical object which can be used to obtain r QO (i?e>) for all mea- 
surements ao in region Rq for all Rq Q V where these r vectors 
can be used to calculate conditional probabilities using (|73l) . 

When we say that we use (|73|1 to calculate conditional probabilities we mean 
that we look to the case where the probabilities are independent of the state 
p(Ri U R2) which basically means that we use ()74[) or Ij75(l . 

If we look back at the two cases we have explicitly worked out, CProbT and 
QT, then we see that we can actually regard the open causaloid as the causaloid 
for the full pack V. By defining this object we have effectively removed the 
boundary condition and so have a much more useful object. The causaloid for 
the full pack V fully characterizes the physical theory without the qualifications 
given in Sec. 1211 However, we cannot actually be sure that there will exist such a 
causaloid. It is possible that conditions outside the region of the experiment will 
always influence conditional probabilities inside V and, if these are not taken 
into account, we cannot have well defined probabilities in V and so cannot have 
a causaloid for V. One way to avoid this would be to take the causaloid to 
correspond to the whole universe so there can be no possibility of influences 
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(a) 



(b) 



Figure 8: Dynamic causal structure can manifest itself when the effective causaloid 
in R depends on the data in R — R. 

outside V. If we do this we get into various issues such as what it means to 
have repeatability and what it means to take data for the whole universe. We 
will discuss these issues in Sec. ESI 

31 Dynamic causal structure 

The causaloid is a fixed object. Yet at the same time we have not assumed 
any fixed causal structure in deriving the causaloid formalism. That is to say 
we have not specified any particular causal ordering between the elementary 
regions. In this sense we must have allowed the possibility of dynamic causal 
structure. It interesting to see a little more explicitly how this can work in the 
causaloid formalism. 

The best way to see this is as follows. Given a causaloid for R (we could 
consider it for R — Ro b , or V instead) we can imagine that we have collected 
data (^ij_^!^ 1 fj_^) in region R — R. We can regard this new data as condi- 
tioning, like C, for a new causaloid in R. Let us call this new causaloid A. 
We could alternatively imagine collecting different data, {X' ^,F'_£), in the 

same region R — R and obtain the causaloid A' for the same region R. This is 
shown in Fig. |H1 Both A and A' can be calculated from the original causaloid. 
Now, it is possible that the causal structure evident in A is quite different to 
that evident in A'. For example, it could be that some subset of nodes in R has 
the causaloid diagram diagram shown in Fig. 0a) if the causaloid is A whereas 
the same nodes have the causaloid diagram shown in Fig. Efb) if the causaloid 
is A' (note though that a general causaloid cannot be represented by causaloid 
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(a) 



(b) 



Figure 9: Depending on data collected in some other region, the causal structure 
among these four nodes may be as in (a) or (b). 

diagrams of this type). 

Consideration of the causaloid diagrams for CProb T and QT leads us to the 
conclusion that these theories do not have dynamic causal structure of this type. 
We can see that this type of reconditionalization will not change the pattern 
of links. This supports our starting assertion that CProbT and QT have fixed 
causal structure. 

Dynamic causal structure is likely to be quite generic for causaloids. How- 
ever, it is unlikely to be as clear cut as the hypothetical example we just dis- 
cussed. In general we cannot expect the sort of clear cut causal structure we see 
evident in the causaloid diagrams of Fig. In general, the causal relationship 
between nodes may be more complicated than can be represented by pairwise 
links. Thus, when we speak of "causal structure" we do not necessarily intend 
to imply that we have well defined causal structure of the type that allows us 
to determine whether two nodes are separated by a time-like or a space-like 
interval. 

The causaloid can be thought of as containing all potential causal structures. 
The particular causal structure that gets manifested depends on what data 
is collected and even after data has been collected it may not be useful or 
even possible to say retrospectively that two regions were time- like or space- like 
separated. 



52 



32 Problems with putting general relativity in 
the causaloid framework 



General relativity is, ultimately an empirical theory concerning data and so it 
should be possible to put it in the framework here or one very similar to it. 
However, as things stand, there are a number of obstacles. 

General relativity is based on a space-time continuum whereas the structure 
we have described, being based on actual data, is discrete. There are various 
approaches we could take to dealing with this. First, we could attempt to find 
a continuous version of the causaloid formalism. This would involve taking 
limits to pass from a discrete to a continuous structure. Alternatively, we could 
use a discrete version of GR such as the Regge calculus (f° r other discrete 
approaches see QZ]). Since our ultimate objective is to formulate a theory 
of QG which is likely to be discrete and then show that GR emerges as a limiting 
case we might well be satisfied with a discrete version of GR. We will discuss 
issues pertaining to the continuum in a the next section. 

General relativity is deterministic. If general relativity had an external time 
we could use CProbT and form probabilistic mixtures of pure states. CProbT 
is well defined and so this would be a fairly straight forward matter. Determin- 
istic GR would then simply correspond to the evolution of pure states in this 
framework. However, since we do not have external time this option is not open 
to us. Rather we have to exploit the causaloid structure. Despite the fact that a 
theory of quantum gravity would require us to address the issue of probabilities 
in the context of GR, there appears to be very little work in this area. One 
issue is that in GR we tend to find a solution for a whole space-time whereas 
probabilities are often understood in the context of a repeatable situation. Even 
if we could create many copies of the universe, we have can only have access to 
data from one copy. This issue, at least, can be addressed (see Sec. I35fl . 

We have introduced the notion of agency. We allow the possibility of al- 
ternative actions - such as the setting of knobs. Agency is actually a very 
common notion in physics. In Newtonian physics we speak of forces being ap- 
plied. In quantum theory we think of performing different measurements. In 
both classical and quantum physics, we can remove the need for agency by, for 
example, including a full description of the agents in the Hamiltonian. Even 
so, we might ask whether we can really understand physical theories without 
some notion of agency. An equation in physics tells us what would happen were 
various counterfactual possibilities realized. In quantum theory the notion of 
agency is especially entrenched. The quantum state is, one might argue, best 
understood as a list of probabilities pertaining to incompatible measurements 
- that is pertaining to different possible actions - and is therefore difficult to 
interpret without agency. The notion of agency is not naturally incorporated 
into Einstein's field equations. However, we could attempt to incorporate it by 
interpreting the eventual effects of tiny differences in T^ v well below the exper- 
imental resolution we are working to as corresponding to different choices of an 
agent. For example, whether a knob is set in one position or another depends 
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on tiny differences in the brain of the experimentalist. A special case is where 
there is only ever one choice of action. And consequently no-agency is a special 
case of agency. This means, at least formally, there is no problem of putting 
GR into an agency based framework. However, if do not have active agency 
we lose something potentially quite important. The notion of causal structure 
is best understood from an agency point of view. Thus, we can ask whether 
measurement outcomes in one location depend on what action is implemented 
in another location. We cannot employ this understanding of causality if we do 
not have an active notion of agency. 

In GR coordinates, x M , are usually understood to be abstract. If follows 
from the principle of general covariance that, having found a solution (g^ v , T^ v ) 
on some manifold M. 1 the correct description of nature is given by forming the 
equivalence class over all diffeomorphisms of that manifold (smooth mappings of 
the points in M. onto itself). The abstract coordinates, x M , themselves have no 
physical meaning. However, the x which appears on the cards in the causaloid 
formalism is an actual observation and must be read off an actual physical 
system. In one sense this is better. The cards have actual data on them and, in 
this sense, must correspond to observables. In some sense we are (from the point 
of view of GR) already working with diffcomorphism invariant objects. However, 
this leaves the problem of how to put GR in the causaloid formalism. One 
approach to this problem is to attempt to introduce actual physical coordinates 
into GR. Einstein, in a semi-popular account of GR ^3|> spoke of a "reference- 
mollusc" as a way of giving physical meaning to the abstract coordinates. Thus 
he imagined many small clocks attached to a non-rigid reference body such that 
infinitcsimally displaced clocks have readings that are infinitcsimally close (see 
also JHl)- We could consider many different molluscs. Einstein then states 
"The general principle of relativity requires that all these molluscs can be used 
as reference-bodies with equal right and equal success in the formulation of the 
general laws." Another physical reference frame is provided by the fact that 
the universe is, at a fine grained level, highly non-isotropic and inhomogeneous. 
Therefore the view from each point is different. This provides a way of physically 
labelling each space-time point (or at least each small region) . A modern version 
of a physical reference frame is the GPS system. Any space-time point could 
be labelled with the retarded times received from four appropriately positioned 
clocks. Rovelli has considered how one might go about measuring the metric 
using such a GPS system |12| . General relativity has the property, in common 
with other pre-quantum physical theories, that it is fairly harmless to consider 
counterfactuals. Thus, even if there is no reference mollusc, we can consider, 
counterfactually, what would have happened if there had been one. Rather than 
having T M „ we have T M „ + T™ lhxsc . If T™ llusc is sufficiently small compared 
to T^y then the two solutions of Einstein's field equations will differ very little. 
Hence, we can draw empirical conclusions about the solution with no mollusc by 
looking at the solution with a mollusc. In quantum theory such counterfactuals 
are notoriously more tricky. Two solutions differing only in a single photon can 
be quite radically different. A successful attempt to find a theory of QG should 
embrace rather than shy away from this issue. 
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33 New calculus 



Newton invented differential calculus for the purpose of understanding the mo- 
tion of particles. In doing so he considered a ratio Sx/5t and took the limit 
as St — > 0. There are two ways we might understand this limit. We might 
imagine that time is ontologically continuous and so it makes sense to consider 
smaller and smaller time intervals. Alternatively, we might take an operational 
approach. Thus, we might imagine that there is no limit to how accurately 
we can measure Sx and St. If we take an operational approach then we know 
that there is certainly a practical limit to how accurately we can measure these 
intervals and, furthermore, there may be limits in principle. 

In practice rather than directly measuring Sx and St to some incredible pre- 
cision we often perform a measurement over a much longer time and extrapolate 
back to get a value of Sx/St. For example, we may deduce the transverse ve- 
locity of a particle emerging from a hole in a card from the position, y, it hits 
a screen placed at some distance. We will have some model of the motion of 
the particles which will allow this deduction. Tiny differences in this transverse 
velocity will correspond to big differences in y. And this is really the key point. 
The reason we want to imagine that we have a well defined ratio Sx/St is that, 
even though it is essentially impossible to measure it as defined, our models 
predict that tiny differences in the ratio correspond to big differences at later 
times. There is a danger that the quantity Sx/St only derives it meaning from 
the model and this process of extrapolation and that it does not actually have 
either the ontological or operational meaning we allot to it. 

Even though very small differences in Sx/St may not be measurable, the 
much bigger differences in y at the screen are. Ideally we would like a calculus 
which is not based on quantities that are not directly measurable (and may 
have no meaning) but is still predictively useful. How are we to account for 
the measurable differences in y other than in terms of small differences in some 
quantity Sx/St we cannot measure? In fact, in developing the causaloid for- 
malism, we have already given an answer to this. We developed the notion of 
fiducial measurements. Thus, though we might be able to measure a large set of 
quantities, it is only necessary to measure a small fiducial set to deduce all the 
others. Thus, rather than relating y to a quantity that cannot actually be mea- 
sured, we can simply relate it to other quantities like y which can actually be 
measured. The causaloid formalism gives a consistent way of doing this (though 
for probabilities). For example, we might imagine that we can apply various 
fields to particles emerging from the hole, place the screen at various distances, 
and so on. We would like to know the position the particle is detected on the 
screen for all these various things we might do. These various positions will 
be related and the causaloid formalism provides the appropriate mathematical 
machinery for relating them. 

One reason we seek to define quantities like Sx/St is that we want to know 
the state at a given time t. The usual notion of "state at time t" is that it 
pertains to some ontological state of affairs at time t which can be measured, in 
principle, at time t or at least within some short time t to t + St. But it is not 



■55 



necessary that nature admits such notions at a fundamental level. And, even if 
nature does not admit these notions, we will still be able to do empirical science 
using the causaloid formalism. In the causaloid formalism the basic object, A, 
is built out of lambda matrices. These matrices pertain to operationally defined 
elementary regions at the level of experimental apparatuses. We can use the 
causaloid to give meaning to the state at time t by choosing a foliation as in (|S7|) . 
The state at time t is p(t) = p(R(t)) where R(t) contains everything of interest 
that comes after time t. The fiducial measurements fi(t) are in R(t). A stronger 
constraint would be to demand that the fiducial measurements actually fall in 
R t = R(t + 1) — R(t). That is we might demand that it be possible to establish 
the state at time t by measurements carried out during the small time interval t 
to t+1. This requirement, or something like it, would seem to be a feature of all 
current physical theories. However, there is no reason to demand it in principle. 
If we drop this constraint then move beyond the type of situation envisaged 
by Newton where the state at time t is specified by quantities measurable in 
principle during the time interval t to t + St. Thus, we see that the causaloid 
formalism provides us with a new calculus capable of dealing with situations 
where Newton's differential calculus would be inappropriate. 

The advantage of differential calculus and the implied ontology is that, where 
it works, it affords a simple picture of reality which allows significant symme- 
tries to be applied. We can hope that increased familiarity with the causaloid 
approach may achieve something similar. 

In classical physics, including GR, the distinction between ontological no- 
tions of space and time and the operational support for them is not an important 
one. There are no fundamental limitations coming from classical theories on how 
small apparatuses which might measure Sx and St can be. In QT the situation 
is a little more subtle. QT as applied to say systems of atoms does predict a 
scale which suggests a limit on how small apparatuses might be. However, we 
can imagine that there are other fields which can probe nature on a smaller 
scale and there is no limit coming from QT as to how small this scale might 
be. In these cases we can always imagine in principle probing on much smaller 
scales than the characteristic scales of the physics being considered. It is only 
when we get to QG that we hit in principle limits to our ability to probe nature 
directly at smaller and smaller scales. Any instrument used to probe at these 
small scales will necessarily have mass and energy. As Sx and St become smaller 
the associated energy and momentum will lead to black hole formation. This 
happens at the Planck scale. Thus in QG we expect for the first time a clear 
break down in our ability to give operational support for ontological notions of 
continuous time and space (see also ^H]). We should be wary of introducing 
ontological notions which are not backed up, at least in principle, by operational 
procedures since we risk introducing factitious elements into our theory. Hence, 
in QG we expect that we will have to use a different calculus. The causaloid 
calculus provides a way forward here. 

The formulation of ProbGR is likely to be useful in formulating QG. How- 
ever, it is worth noting that this likely operational breakdown of the continuum 
distinguishes QG from ProbGR and constitutes an extra problem that we must 



5G 



deal with. 

It is often stated that experiments to test a theory of QG will involve probing 
nature at the Planck scale. It is no coincidence that apparatuses we might 
construct to do this would have to be very big. As illustrated above, postulated 
variation at a small scale shows up at a large scale and we might even doubt 
that there is any ontological meaning to talking about what is happening on this 
small scale. The fiducial measurements in the causaloid formulation for such an 
experiment will, we expect, be at a much larger scale than the Planck scale. 

34 Ideas on how to formulate GR and QG in 
the causaloid framework 

In the causaloid framework the lambda matrices that make up the causaloid tell 
us everything. In a formulation of GR they must therefore replace both g^ and 
!),„. In this respect it is interesting to note that the two roles of g^ v pointed out 
in Sec. EH have analogues in the causaloid formalism. Thus, the local lambda 
matrices tell us about local physics as does the value of g^ at a point, and the 
composite system lambda matrices tell us about how elementary regions become 
correlated as does the connection V^ v (which depends on the local variation of 
g^v). There are two approaches we could take to formulating GR in the causaloid 
framework. First we could attempt to put a discrete version of GR (such as the 
Regge calculus) in the framework. Secondly, we could attempt to rederive GR 
from scratch in this new framework perhaps by imitating appropriate aspects of 
Einstein's original derivation of GR. The second approach is likely to be more 
fundamental though the first approach may help us gain important insights. 

Data is collected on cards and is typically of the form (x, a, s). We can think 
of x as playing two roles. First, it provides a label that differentiates the ele- 
mentary regions and second, it provides a local orientation in four-dimensional 
space-time. The action a will typically have some direction associated with it. 
Thus, we might measure the spin along a certain axis. If we want think rela- 
tivistically we should use a four vector a M to describe this measurement (this 
being measured relative to the local orientation provided by x). Similarly the 
outcome s will also have a direction and should be denoted with a four vector 
s u . The measurement {Xr x , FrJ) would therefore be associated with two four- 
vectors. It is reasonable, therefore, to suppose that a fiducial set is given just be 
taking such measurements associated with the sixteen components (as happens 
in tensorial analysis). Thus, we can select a subset of all measurements where a 
and s are orientated along the fi and v directions respectively. This will lead to 
fiducial measurements fc^„. For each \iv there will be many such fc's correspond- 
ing to each possible value of a and s. The situation may be more complicated 
since a and s may have more /i-type labels. Motivated by the discussion in Sec. 
1261 we could attempt to formalize this by saying that in each elementary region 
R x there exist fields fc^„(x) where 

Q(x) = fT'(x) x fl l '(x) x ••• x n l "(x) (138) 
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with 

ft i (x) = (All k^(x) forgiven x,i) (139) 

Using suggestive notation we could have fields g^ u and T^ u . We would then 
have local lambda matrices 

and composite system matrices 

A S^(x) 9 (x') ^ f)T A,') , , 

h IIV (x)h CTT (x') S Ml ,(a;)5 CTT (a:') V / 

where we use h and S to label the precompression elements in the product omega 
sets. We can form similar objects for composite systems composed of more than 
two elementary regions. Since we do not have fixed causal structure we do not 
expect the methods of Sec. |2H1 to work. If these initial steps are correct then 
the problem of formulating both GR and QG in this framework is to find the 
corresponding causaloids. 

In general relativity, the principle of general covariance requires that the 
form of the laws, expressed as equations between tensorial objects, is invariant 
under general coordinate transformations. We could attempt to do something 
similar in the causaloid formalism. Thus, let us imagine that the causaloid is 
determined by solving some equations. Then there are two levels at which we 
could demand something like general covariance. First, at a general level. We 
can write down the causaloid with respect to an arbitrary set of omega sets 
for each region Rq- We could require that the equations which determine the 
causaloid take the same form for any choice of omega sets. Second, and less 
generally, we could just require that these equations are invariant under general 
transformations of the coordinates x 11 — > x'^ which induces a transformation 
k^vix) — > k'* v (x) of the local fiducial measurements, which, in turn, induces a 
transformation fl z (x) — > fl n (x) in the local omega sets. 

At least in the case of GR we should seek a way to implement the principle 
of equivalence. As a first stab at this we might require that there always exists a 
coordinate transformation inducing a transformation to local omega sets f2g R (x) 
such that the local lambda matrices predict special relativistic physics in R x . 
However, since we are in a probabilistic context we have to admit the possibility 
that we have uncertainty as to what this local frame is. A sufficiently precise 
measurement in R x should be able to establish what the set of local inertial 
frames is for each field i. We can let this measurement be associated with a 
set of measurement vectors v where g^ v labels the local inertial frames which 
leave g^ in Minkowski form. The equivalence principle requires that the fields 
are correlated so that, when there is certainty as to what the local inertial frame 
is for each field, there is agreement. Thus, 

If r l^(x) ■ P(x) = 1 and r> h)iV (x) ■ p(x) = 1 then = g^ (142) 

This should work in GR. However, in QG we expect to get pure states which 
are superpositions of other pure states. Thus, if we have Pi(cc) and P2(a;) 
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each of which agrees with special relativistic physics but with different g^ v , 
then we expect other pure states corresponding to some sort of superposition 
of these for which there is no transformation to special relativistic physics. The 
implementation of the equivalence principle in QG would therefore seem to 
require a preferred basis with respect to which it holds. 

If we are successful in formulating GR (actually ProbGR) in the causaloid 
framework we can then attempt to formulate QG. It is quite likely that some of 
the differences between ProbGR and QG mirror the differences between CProbT 
and QT and this might give us a strong handle on how to obtain QG from 
ProbGR. There are two key differences between CProbT and QT. 

First, in CProbT we have K — N whereas in QT we have K = N 2 (in the 
sense discussed in Sec. O- Thus, if we are to build up a complete set of fiducial 
measurements in QG it is likely that we will want to add extra measurements 
to those GR. In fact the situation is a little more complicated for the causaloid 
than it was in Sec.El Since the preparation for the elementary region R x is both 
to the future and the past (and the sides) of R x it is the transformation matrices 
which map linearly to the r vectors and hence, as discussed in Sec. [2HI we have 
\tt x \ = iV 2 in CProbT and \ft x \ = N* in QT. To get from to QT from CProbT 
we need to add two new fiducial r's for each pair of fiducial measurements in 
CProbT. 

The second key difference is that QT has a continuous set of reversible 
transformations whereas CProbT has only a discrete set. This has the effect of 
filling out the space of pure states in QT. In QT we have unitary transformations. 
This is unlikely to survive in QG since we do not have a fixed background to 
evolve the state with respect to. Thus, information about the state is likely 
to leak out into the degrees of freedom which represent our frame of reference. 
For a transformation to be reversible on the other hand we require that no 
information about the state leaks out. However, we may have something which 
very well approximates reversible unitary transformation for sufficiently small 
regions of space time. 

In the above discussion we used and T^ v . However, operational quantities 
would be better than these. Operational quantities would involve the meeting 
of test particles, the behaviour of rays of light, and so on |2l)l 12 II 122] . 

35 The universal causaloid 

We have defined a few notions of causaloid: (i) the causaloid for a predictively 
well defined region R; (ii) the open causaloid; and (iii) the causaloid for the 
whole pack V (this may not actually exist). We now wish to introduce a further 
notion - the universal causaloid. The motivation for this is to remove the need 
for repeatability. We repeat an experiment many times and bundle the cards 
from each run together forming a stack. The fact that we are able to bundle the 
cards separately indicates that, actually, there is some additional marker which 
could constitute recorded data that distinguishes the cards from one run to the 
next. For example, in the case of the probes floating in space illustrated in Fig. 
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0WC reset the clocks and take note of the fact that we have done this. All this 
is rather artificial. Why should we bundle our data into stacks coming from 
different repetitions of an experiment? It would, one might imagine, be more 
natural simply to collect one big stack of data which might or might not be 
regarded as coming from repetitions of an experiment. The problem with this is 
that it is unclear how we might interpret probabilities. The following approach 
seems reasonable. Associated with any proposition A concerning the data that 
might be collected is some vector va- In testing the data to see whether A is 
true we will be testing its truth among a complete set of mutually exclusive 
propositions A, A' , . . . , A" . We define r A = ya + *A' + ■ • • + ^A" ■ We will say 
that A is true if 

r A « r A (143) 

we use the symbol « because we can never expect experimental data to give 
absolute support for a proposition. We can decide in advance just how exactly 
equal we require these two vectors to be. 

To illustrate how this can apply to the case of probabilities consider the 
vectors 

r« = r ( x 1 „,F 1 „) ® A *(x 2n ,F 2n ) (144) 

pertaining to in the disjoint regions R n = Rin U i?2n for n = 1 to N where N 
is big. Further, define 



r 7 , 



r (Yi n ,F ln )® A r(x 2n ,F 2n ) (145) 



We define r„ = r 7 — r„. Now assume that 

r n =pri (146) 

for all n. We see that r„ is like v and r 7 is like u of Sec. |2U] In our previous 
language we would say that this means that the probability of X\ n in R\ n given 
see Xi n in Ri n and we perform procedure F\ n U i 7 ^ in R n is p. But we can 
turn this into a statement in the form of il 143(1 . Thus, consider 

va= ((8) A rn) ® A l(g) A f„ ) (147) 

(p-Ap)N<\S\<(p+Ap)N VnGS / \„ e g / 

This is the vector corresponding to the property that pN out of the N regions 
R n have outcome to within ±ApN. We also have 

r' A = (g) V* (148) 

n 

Using H146[> wc obtain 

r A =( Yl \p n (l-p) N - n ])r A (149) 

\(p~Ap)N<\S\<(p+Ap)N J 
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From the theory of binomial distributions we have 

£ [p n (l - p) N ~ n ] « 1 - 0(l/[Ap^N}) (150) 

(p-Ap)W<|S , |<(p+Ap)JV 

Hence, if N is sufficiently large, condition 1143|) is satisfied and we can say that 
the proposition is true. 

This means that we do not need to make repeatability intrinsic to the def- 
inition of the causaloid. Rather, we can simply define a universal causaloid so 
that we can look for properties that are true (to within some small error). We 
define 

The universal causaloid for a region made up of elementary re- 
gions R x is, if it exists, defined to be that thing represented by any 
mathematical object which can be used to calculate vectors ya for 
any proposition A concerning the data collected in these elemen- 
tary regions such that if the proposition is true (to within some 

small error) we have » r A where r A = ta + r A' H h ta" an d 

A, A', ... , A" is a complete set of mutually exclusive propositions. 

We see that using this object we can recover the notion that we have probabilities 
by using the argument above in reverse (though see |2S1 for a cautionary tale 
on this subject). However, the universal causaloid is potentially a richer object. 
We can formulate many questions about the data as "is proposition A true?" 
pertaining to situations where we have not repeated the same experiment many 
times. The universal causaloid should enable us to answer all such questions. 

One problem with the universal causaloid is that we cannot directly measure 
it (unlike the earlier causaloids we defined) since we do not have repeatability. 
It is repeatability that allows us to obtain probabilities for different procedures 
and hence calculate the lambda matrices. However, we can suppose that there 
are certain symmetries and deduce the causaloid that way. The causaloid for 
CProbT and QT will simply be that found by allowing a causaloid diagram 
such as that in Fig. HJc) to extend indefinitely to include possible repetitions 
of the experiment. We may imagine that the causaloid extends indefinitely 
into the future and arbitrarily far into the past. Indeed, we might think of 
the universal causaloid as corresponding to the entire history of the universe 
(this would be essential if we want to consider cosmology). However, we have 
the problem that we cannot expect to collect cards from such an arbitrarily 
large region and send them to a sealed room. The universal causaloid, as a 
mathematical object, transcends the limited domain in which the causaloid was 
first conceived. This can be regarded as removing some of the operational 
scaffolding we had originally erected to help find a more general probability 
calculus. Having removed this scaffolding we are still able to use the universal 
causaloid to make predictions. 
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36 The principle of counter factual indifference 



As we develop the causaloid formalism we should look for simplifying assump- 
tions. One possible such assumption is the following 

The principle of counterfactual indifference states that the 
probability of E does not depend on what action would have been 
implemented had E' happened instead if we condition on cases where 
E' did not happen (as long as the device implementing this action 
is low key). 

For example, imagine Alice tosses a dice then a coin. Then the probability that 
a coin comes up heads cannot depend on the fact that had a six come up she 
intended to bend the coin in a particular way if we only consider those cases 
where a six did not come up. Indeed, in a different procedure she might have 
intended not bend the coin had the six come up. If the principle of counter- 
factual indifference were false in this case, then somebody could deduce Alice's 
intention from data in which a six never comes up and where she never imple- 
ments her intention. But this would contradict the principle of indifference to 
data since such intentions arc part of the programming and correspond to the 
detail of the way in which information is stored (in this case Alice's intentions as 
manifested in her brain). And indeed a little thought shows that the principle 
of counterfactual indifference is a consequence of the principle of indifference 
to data (given that low key physical systems are used to process data). The 
principle of counterfactual indifference implies 

rpCi.Fi) = r (x 1 ,Fl) where IiCf b F( (151) 

since both procedures F x and F[ amount to doing the same thing if we see X x . 
We do, indeed, have this property in CProbT and QT. 

37 Comparison with other approaches to QG 

The approach outlined in this paper aims at finding a framework for a theory 
of QG. As such, it is quite possible that other approaches to QG will fit within 
this framework. However, there are two key aspects of the causaloid framework 
which should be compared with other approaches. 

1. We deal with data that may be collected in actual experiments. The ap- 
proach here is "top down" rather than "bottom up" . All other approaches 
to QG start with some ideas about the structure of space and time at very 
small scales (usually the Planck scale) and then attempt to build up. 

2. The causaloid formalism is more general than quantum theory. We can 
attempt to treat QT and GR in an even handed way rather than requiring 
GR to fit fully in the framework of QT. Most other approaches to QG take 
the basic form of QT unchanged. 
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The two main approaches to QG arc string theory [21] and quantum loop gravity 
12312111221 though there are other approaches. 

String theory assumes a fixed non-dynamical background space-time and 
attempts to obtain a perturbative version of quantum gravity (along with the 
other fundamental forces). It is difficult to see how, when we go beyond the 
pertubative domain, this approach could have truly dynamic causal structure 
as it must. The basic picture appears to be that of unitary evolution in standard 
quantum theory. Thus, string theory fits in the usual quantum framework. 

Quantum loop gravity is canonical approach. A canonical formulation of 
GR is quantized. Thus, it is fundamentally a 3 + 1 approach and this appears 
to be the conceptual origin of some of the mathematical problems faced by 
the program. In treating space and time on a different footing we break the 
elegance of Einstein's fundamentally covariant approach (this seems to be ok 
for canonical QG if we adopt the Newton-Cartan approach where c — > oo and 
we have natural 3 + 1 splitting j2H])- Rather than forcing GR into a canonical 
framework it seems more natural to require that QT be put in a manifestly 
covariant framework. This appears to be problematic since the notion of a state 
across space evolving in time is basic to the usual formulation of QT. However, 
the causaloid framework allows us to go to a more fundamental formalism in 
which we do not have a state evolving in time. 

A new approach emerging from the theory of quantum loop gravity is the 
spin- foam approach (see [2.9 for a short review). In this approach spin- foams 
represent four dimensional histories in space time. The evolution between two 
times is represented by an amplitude weighted sum over such spin- foams. In 
this approach we see graphs dressed with matrices. However, the causaloid 
diagrams are quite different since they are fixed for a given theory. The notion 
of a history is clearly better than that of a state at a given time so far as 
providing a manifestly covariant formulation is concerned. However, a history 
is a rather big thing. It involves all possible events between the two times 
of interest. The causaloid formalism deals with matrices between elementary 
regions. In the case that there exist RULES we may only need to specify local 
lambda matrices and lambda matrices for pairs of regions (as in QT). This is 
closer to Einstein's original approach than providing an amplitude for an entire 
history is. 

Another approach is the causal set approach of Sorkin 30 . In this approach 
the fundamental notion is of points with causal relations between them. Causal 
sets are taken to obey certain axioms so that they form partially ordered sets. 
These sets are represented by a graph with nodes joined by links. The partially 
ordered sets of the causal set approach are actually very different to the causaloid 
for various reasons. First, these partially ordered sets are supposed to provide 
a picture of space-time at the Planck scale. Second, the links are not dressed 
with matrices and so causal relationship between the points is not as rich as 
that between elementary regions in the causaloid formalism. And third, a given 
causal set is meant to provide one possible history (rather like a spin-foam) 
whereas the causaloid is a fixed object (though one which contains a way of 
calculating probabilities for all histories). 
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A more recent approach is due to Lloyd [3T]. He suggests that quantum 
gravity should be modelled by a quantum computation. In so doing he is able 
to implement dynamic causal structure. However, since the whole process is 
embedded in standard unitary quantum theory, there is still a background time. 

Another approach is non-commutative geometry pioneered by Connes |32| . 
The basic idea is that operators like x, y, and z, at a point do not commute. 
This appears to fit within the framework of QT as we have defined it (since 
we said nothing about commutation relations). In standard QT the operators 
A <8>I and I<8>B will commute by virtue of the way the tensor product is defined. 
In the causaloid formalism the analogous objects will not, in general, commute 
if (g) is replaced by <g> A . In this sense there may be a connection between the 
causaloid formalism and non-commutative geometry. 

One problem which is common to most approaches which start with a Planck 
scale picture is that it is difficult to account for the four dimensional appearance 
of our world at a macroscopic level (Smolin calls this the "inverse problem" 
Since the approach in this paper starts at the macroscopic level, it may 
allow us to circumvent this problem in the same way Einstein does in GR. 
Thus, we would not attempt to prove that space-time is four dimensional at 
the macroscopic level but put this in by hand. This is not an option in Planck 
scale approaches to QG because the constraint that a four dimensional world 
emerges at the macroscopic scale has no obvious expression at the Plank scale. 

The best approach, however, may be to combine an approach which posits 
some properties at a Planck scale with the causaloid approach. By working in 
both directions we might hope to constrain the theory in enough different ways 
that it becomes unique. 

38 Conclusions 

We have developed a framework for probabilistic theories which allow dynamic 
causal structure. Central to this is an object called the causaloid. This object 
is theory specific and we have calculated the causaloid for classical probability 
theory and quantum theory. We have not calculated it for GR though we 
presented some preliminary ideas. The results in this paper suggest the following 
program for finding a theory of QG. 

1 . Formulate probabilistic general relativity (ProbGR) in the causaloid frame- 
work. This will involve finding RULES to calculate the causaloid from a 
basic set of lambda matrices 

2. Construct r Qx for quantum gravity from the r Qx of ProbGR in such a way 
that we go from \Q X \ = N 2 to |Q X | = iV 4 . 

3. Find RULES for QG from the RULES for ProbGR. 

A particular issue we will have to pay attention to is that in ProbGR we have 
continuous space-time whereas we do not expect to be able to give operational 
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support to this notion in QG. This may force us to be more radical in the 
construction of QG than is suggested by the above three steps. 

The causaloid formalism contains some elements in common with the Aharonov, 
Bergmann, and Lebowitz (ABL) time-symmetric formulation of QT In- 
deed, it may even be regarded as a radical generalization of the ABL formula- 
tion. The ABL approach has led to a number of fascinating results where naive 
reasoning leads to counterintuitive though correct results. For example, if a par- 
ticle is tunnelling through a potential barrier then, when it is in the "forbidden" 
region, one might naively reason that its kinetic energy (total energy minus po- 
tential energy) is negative (even though it should always be positive). It turns 
out [33] that a certain type of measurement of the kinetic energy (called a weak 
measurement) will actually give negative readings if the state of the particle 
is preselected in its half evolved state and postselected in the forbidden region 
(using the ABL approach). The causaloid formalism might be expected to put 
such counterintuitive properties in an even more general setting and this may 
contribute to our understanding of them. 

The approach taken here attempts to combine the early operational philoso- 
phy of Einstein as applied to GR with the operationalism of Bohr as applied to 
QT (see for a discussion of how Einstein and Bohr might have engaged in a 
more constructive debate). We do this primarily for methodological reasons to 
obtain a mathematical framework which might be suitable for a theory of QG 
without committing ourselves to operationalism as a philosophy of physics. In 
fact it is interesting just how close this early philosophy of Einstein is to the 
later philosophy of Bohr. Einstein said 

The law of causality has not the significance of a statement as to the 
world of experience, except when observable facts ultimately appear 
as causes and effects [JJ- 

and 

All our space-time verifications invariably amount to a determina- 
tion of space-time coincidences. (...) Moreover, the results of our 
measurings are nothing but verifications of such meetings of the 
material points of our measuring instruments with other material 
points, coincidences between the hands of a clock and points on the 
clock dial, and observed point-events happening at the same place 
and the same time Pj. 

Bohr said 

However far the phenomena transcend the scope of classical physical 
explanation, the account of all evidence must be expressed in classi- 
cal terms. (...) The argument is simply that by the word experiment 
we refer to a situation where we can tell others what we have done 
and what we have learned and that, therefore, the account of the ex- 
perimental arrangements and of the results of the observations must 
be expressed in unambiguous language with suitable application of 
the terminology of classical physics 
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While Einstein might have felt uncomfortable with the lack of ontological clarity 
of Bohr's interpretation, there is a striking similarity between these sentiments. 
This underlines the power of operationalism as a methodology. 

The formalism here was developed specifically in the hope of finding a theory 
of QG. However it may find application in other areas, even outside physics. 
Indeed it may be useful in any situation where there is reason to believe that a 
straightforward analysis in terms of a state evolving through time is inadequate. 
An example might be where we are trying to model the behaviour of a system 
that is better able to predict the future than we are. The behaviour of such a 
system would appear to depend on the future in ways we could not account for 
in a purely forward in time way and the causaloid formalism might be useful 
here. One possible example of a system of this nature would be the financial 
markets. 
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